Confused about all the data, AI, and ML terms? And still wondering what MLOps, DevOps, and SRE are about? This glossary will help you! If you want additional information feel free to reach out.
An AI system that can understand, learn, and apply knowledge across diverse domains, perform tasks with human-level competence, adapt to new situations, and exhibit reasoning, problem-solving, creativity, and social intelligence. AGI aims to surpass the limitations of narrow AI, which is designed for specific tasks or domains and lacks the ability to generalize knowledge or transfer skills to new situations.
The field of computer science focused on creating systems and algorithms that can perform tasks that typically require human intelligence. AI systems can process large amounts of data, learn from experiences, and adapt to new situations. AI includes a variety of subfields, such as machine learning, natural language processing, and computer vision.
AI for IT operations, including ML for DevOps. This term was originally coined by Gartner and refers to the use of AI tools and technologies to improve IT operations. This means collecting, aggregating, and analysing the vast amount of data generated by IT components to provide value. It includes topics such as anomaly detection, root cause analysis, intelligent monitoring, and predictive maintenance.
The approach of automating some or all stages of the ML development process from raw data to deployment of a fully trained and validated model. It can help non-experts to efficiently make use of machine learning techniques, and can also support experts to improve their productivity - for example, for prototyping models.
An AI language model developed by OpenAI that can understand and generate text in a conversational context. It can engage in back-and-forth exchanges, provide information, answer questions, offer suggestions, and assist with various tasks.
The combination of Continuous Integration (CI) and Continuous Delivery (CD). Sometimes CD can also refer to Continuous Deployment.
The approach of automating the delivery of software changes. It builds on top of Continuous Integration and additionally includes deployment to development environments, integration tests, load tests, and other tests. The goal is to always be able to deploy a new version of the software. Deployment requires (manual) approval.
The approach of automating the deployment of software changes to production environments. It builds on top of Continuous Integration and Continuous Delivery, but does not require manual approvals for the final release of the new software version.
The approach of automating the integration of software changes, including building and (unit) testing. It includes the process of continuously consolidating multiple sources into a single unified view.
The practice of observing and assessing an application continuously in real-time to ensure that it is performing as expected. The practice of constantly collecting, processing, and analyzing data from software systems, infrastructure, and machine learning models to ensure their performance, reliability, and security. Continuous monitoring allows for early detection of issues, proactive maintenance, and informed decision-making to optimize the system's performance and resources.
The approach of continuously updating and improving machine learning models by regularly retraining them with new data. This process helps to maintain the accuracy and relevance of the models, allowing them to adapt to changing patterns in the data and address concept drift or data drift over time. Continuous training often involves automated pipelines for data ingestion, feature engineering, model training, validation, and deployment.
A practice which ensures the accuracy and high quality of the development and operations by detecting and eliminating errors. Automated tests are performed at every stage of the development process. By implementing a continuous verification strategy, organizations can ensure that their data, models, and code are verified in real-time, reducing the risk of incorrect decisions and outcomes.
An emerging science that studies techniques to improve datasets, which is often the best way to improve performance in practical ML applications. While good data scientists have long practiced this manually via ad hoc trial/error and intuition, DCAI considers the improvement of data as a systematic engineering discipline.
The process of designing, constructing, and maintaining the architecture and infrastructure necessary to collect, store, process, and analyze large datasets. Data engineers work on tasks such as data ingestion, data transformation, and data storage, ensuring that data is clean, reliable, and accessible for data scientists and other stakeholders.
A technology layer and data curation service which integrates data from the underlying data layer(s) such as data lakes, data warehouses or databases, into a unified and holistic view of the data.
The process of filling in missing or incomplete data values within a dataset. Data imputation aims to minimize the impact of missing data on analysis and machine learning models by using statistical techniques, such as mean imputation, median imputation, or more advanced methods like k-nearest neighbors or model-based imputation.
A centralized repository used to store data. Often contains both structured and unstructured data in raw formats, without enforced schemas. Need to make sure it does not evolve into a data swamp.
An approach combining data architecture with a data operating model to enable sharing, accessing, and managing data products. Important principles also include domain ownership, data as a product, self-service data platforms, and federated computational governance.
A set of practices and principles which help organizations improve the speed, quality, and reliability of their data analytics initiatives. The main components include Continuous Integration, Continuous Monitoring, and Continuous Verification.
A series of data processing steps that transform raw data into a format suitable for analysis, machine learning, or visualization. Data pipelines often involve multiple stages, such as data ingestion, data cleansing, data transformation, feature engineering, and data storage. They can be designed and managed using tools and frameworks that support automation, scalability, and fault tolerance.
A collection of data with little organization, structure and oversight. Often happens when a Data Lake is poorly designed, managed, and/or documented.
A centralized repository used to store, organize, and curate data. As opposed to data lakes, it usually only contains processed (and validated) data, in relational form, with an enforced schema.
A set of practices, principles, and tools that aims to improve the collaboration between software development (Dev) and IT operations (Ops) teams. DevOps emphasizes automation, continuous integration, and continuous delivery to shorten the software development life cycle, reduce deployment failures, and increase the speed and quality of software releases.
Defines how much error, instability, or unreliability users accept over a rolling period of time. It is a budget which can be used to accommodate events such as planned and unplanned releases, or unavoidable hardware failures. Calculated as error budget = 100% - SLO. Should be accompanied by an error budget policy.
A contractual agreement between business and developers which specifies what happens when little or no error budget is remaining. For example, it might mean that top priority shifts from feature development to addressing reliability issues.
The field of AI research that focuses on developing methods and techniques to make the decision-making process of AI systems more transparent, interpretable, and understandable to humans. Explainable AI aims to address the "black box" problem of complex models, such as deep learning, by providing insights into the factors influencing the model's predictions, which can help build trust, ensure fairness, and facilitate better decision-making.
A centralized repository used to store engineered features. It helps data engineers and data scientists to share, organize, and access computed feature values used for model training, validation, and inference.
A large, pre-trained language model that serves as a fundamental building block for various natural language processing (NLP) tasks. These models are trained on massive amounts of text data to learn the statistical patterns and semantic relationships of language.
A branch of AI that focuses on creating models and systems capable of generating new content that is similar to or indistinguishable from human-generated content. These models are designed to learn patterns, characteristics, and structures from existing data and use that knowledge to generate new data that has similar properties.
The process of searching for the optimal set of hyperparameters that govern the behavior of a machine learning algorithm. Hyperparameter tuning can significantly impact the performance of a model and involves techniques such as grid search, random search, and Bayesian optimization to find the best combination of hyperparameter values.
A large language model is an artificial intelligence (AI) system designed to understand and generate human language. It is trained on massive amounts of text data and learns patterns and relationships within the data to generate coherent and contextually relevant responses. Large language models have a wide range of applications, including natural language understanding, machine translation, chatbots, content generation, language-based search, and virtual assistants.
The practices and processes involved in deploying, managing, and monitoring large language models in production environments. These operations aim to ensure the reliability, scalability, and efficiency of the models during real-world usage.
A subset of AI that focuses on the development of algorithms and models that can learn from data and improve their performance over time. Machine learning enables computers to make predictions, recognize patterns, and make decisions without explicit programming, by using statistical techniques and mathematical optimization.
The union of Machine Learning, Data Engineering, and DevOps.
A set of practices, principles, and tools that aims to streamline the development, deployment, and monitoring of machine learning models. MLOps combines aspects of machine learning, data engineering, and DevOps, focusing on reproducibility, automation, and continuous integration and delivery of models to facilitate collaboration between data scientists, data engineers, and IT operations teams.
The phenomenon where the performance of a machine learning model degrades over time due to changes in the underlying data distribution or the relationships between input features and target variables. Model drift can result from concept drift or data drift and often necessitates continuous monitoring, model retraining, and updating to maintain optimal performance.
The process of evaluating a machine learning model's performance using a set of metrics and validation techniques. Model validation typically involves splitting the dataset into training, validation, and testing subsets to assess the model's ability to generalize to new, unseen data. Common validation techniques include cross-validation, holdout validation, and bootstrapping.
The ability to understand the internal state of a system by examining its external outputs, such as logs, metrics, and traces. In the context of software systems and MLOps, observability is crucial for monitoring the performance, diagnosing issues, and optimizing the efficiency of deployed applications and machine learning models.
A contract between a service provider and a service consumer. It defines the expectations towards a service and the consequences for unmet expectations.
An objective metric for reliability. It can be understood as a proxy measurement for user happiness. High values should mean that most users are content, and low values should mean that most users are unhappy.
Thresholds for SLIs which define if the system is performing reliably. It can be understood as the dividing line between happy and unhappy users. Should be stricter than any SLAs to which it relates.
The engineering discipline of creating highly scalable and reliable systems. It focuses on objectively managing the tradeoff between the requirements of different teams, namely fast releases (development teams), quality control (QA teams), and system reliability (operations teams). Important components include SLIs, SLOs, and error budgets.