Important AI terms for professionals and tech enthusiasts

Cut through the noise with this list of 106 useful AI terms curated by Entefy ML scientists and engineers

Unless you live on a deserted island, you are likely to have heard about artificial intelligence (AI) and ways it continues to rapidly gain traction. AI has evolved from an academic topic into a rich field of practical applications for businesses and consumers alike. And, like any advanced technical field, it has its own lexicon of key terms and phrases. However, without deeper AI training and education, it can be quite challenging to stay abreast of the rapid changes taking place within the field.

So, to help demystify artificial intelligence and its many sub-components, our team has assembled this list of useful terms for anyone interested in practical uses of AI and machine learning. This list includes some of the most frequently-used terms as well as some which may not be used as often, but are important in understanding foundational AI concepts.

We encourage you to bookmark this page for quick-reference in the future.


Activation function. A function in a neural network that defines the output of a node given one or more inputs from the previous layer. Also see weight.

Algorithm. A procedure or formula, often mathematical, that defines a sequence of operations to solve a problem or class of problems.

Anomaly detection. The process of identifying instances of an observation that are unusual or deviate significantly from the general trend of data. Also see outlier detection.

Artificial general intelligence (AGI) (also, strong AI). The term used to describe a machine’s intelligence functionality that matches human cognitive capabilities across multiple domains. Often characterized by self-improvement mechanisms and generalization rather than specific training to perform in narrow domains.

Artificial intelligence (AI). The umbrella term for computer systems that can interpret, analyze, and learn from data in ways similar to human cognition.

Artificial neural network (ANN) (also, neural network). A specific machine learning technique that is inspired by the neural connections of the human brain. The intelligence comes from the ability to analyze countless data inputs to discover context and meaning.

Autoencoder. An unsupervised learning technique for artificial neural network, designed to learn a compressed representation (encoding) for a set of unlabeled data, typically for the purpose of dimensionality reduction.

AutoML. The process of automating certain machine learning steps within a pipeline such as model selection, training, and tuning.


Backpropagation. A method of optimizing multilayer neural networks whereby the output of each node is calculated and the partial derivative of the error with respect to each parameter is computed in a backward pass through the graph. Also see model training.

Bagging. In ML, an ensemble technique that utilizes multiple weak learners to improve the performance of a strong learner with focus on stability and accuracy.

Bias. In ML, the phenomenon that occurs when certain elements of a dataset are more heavily weighted than others so as to skew results and model performance in a given direction.

Bigram. An n-gram containing a sequence of 2 words. Also see n-gram.

Boosting. In ML, an ensemble technique that utilizes multiple weak learners to improve the performance of a strong learner with focus on reducing bias and variance.


Cardinality. In mathematics, a measure of the number of elements present in a set.

Categorical variable. A feature representing a discrete set of possible values, typically classes, groups, or nominal categories based on some qualitative property. Also see structured data.

Centroid model. A type of classifier that computes the center of mass of each class and uses a distance metric to assign samples to classes during inference.

Chatbot. A computer program (often designed as an AI-powered virtual agent) that provides information or takes actions in response to the user’s voice or text commands or both. Current chatbots are often deployed to provide customer service or support functions.

Class. A category of data indicated by the label of a target attribute.

Class imbalance. The quality of having a non-uniform distribution of samples grouped by target class.

Classification. The process of using a classifier to categorize data into a predicted class.

Classifier. An instance of a machine learning model trained to predict a class.

Clustering. An unsupervised machine learning process for grouping related items into subsets where objects in the same subset are more similar to one another than to those in other subsets.

Cognitive computing. A term that describes advanced AI systems that mimic the functioning of the human brain to improve decisionmaking and perform complex tasks.

Computer vision (CV). An artificial intelligence field focused on classifying and contextualizing the content of digital video and images. 

Convolutional neural network (CNN). A class of neural network that utilizes multilayer perceptrons, where each neuron in a hidden layer is connected to all neurons in the next layer, in conjunction with hidden layers designed only to filter input data. CNNs are most commonly applied to computer vision. 

Cross-validation. In ML, a technique for evaluating the generalizability of a machine learning model by testing the model against one or more validation datasets.


Data cleaning. The process of improving the quality of dataset in preparation for analytical operations by correcting, replacing, or removing dirty data (inaccurate, incomplete, corrupt, or irrelevant data).

Data preprocessing. The process of transforming or encoding raw data in preparation for analytical operations, often through re-shaping, manipulating, or dropping data.

Data curation. The process of collecting and managing data, including verification, annotation, and transformation. Also see training and dataset.

Data mining. The process of targeted discovery of information, patterns, or context within one or more data repositories.

DataOps. Management, optimization, and monitoring of data retrieval, storage, transformation, and distribution throughout the data life cycle including preparation, pipelines, and reporting.

Deep learning. A subfield of machine learning that uses neural networks with two or more hidden layers to train a computer to process data, recognize patterns, and make predictions. Also see deep neural network.

Derived feature. A feature that is created and the value of which is set as a result of observations on a given dataset, generally as a result of classification, automated preprocessing, or sequenced model output.

Descriptive analytics. The process of examining historical data or content, typically for the purpose of reporting, explaining data, and generating new models for current or historical events. Also see predictive analytics and prescriptive analytics.

Dimensionality reduction. A data preprocessing technique to reduce the number of input features in a dataset by transforming high-dimensional data to a low-dimensional representation.

Discriminative model. A class of models most often used for classification or regression that predict labels from a set of features. Synonymous with supervised learning. Also see generative model.


Ensembling. A powerful technique whereby two or more algorithms, models, or neural networks are combined in order to generate more accurate predictions.

Embedding. In ML, a mathematical structure representing discrete categorical variables as a continuous vector. Also see vectorization.

Embedding space. An n-dimensional space where features from one higher-dimensional space are mapped to a lower dimensional space in order to simplify complex data into a structure that can be used for mathematical operations. Also see dimensionality reduction.


F1 Score. A measure of a test’s accuracy calculated as the harmonic mean of precision and recall.

Feature. In ML, a specific variable or measurable value that is used as input to an algorithm.

Federated learning. A machine learning technique where the training for a model is distributed amongst multiple decentralized servers or edge devices, without the need to share training data.

Fine-tuning. In ML, the process by which the hyperparameters of a model are adjusted to improve performance against a given dataset or target objective.


Generative adversarial network (GAN). A class of AI algorithms whereby two neural networks compete against each other to improve capabilities and become stronger.

Generative model. A model capable of generating new data based on a given set of training data. Also see discriminative model.

Gradient boosting. An ML technique where an ensemble of weak prediction models, such as decision trees, are trained iteratively in order to improve or output a stronger prediction model.

Ground truth. Information that is known (or considered) to be true, correct, real, or empirical, usually for the purpose of training models and evaluating model performance.


Hidden layer. A construct within a neural network between the input and output layers which perform a given function, such as an activation function, for model training. Also see deep learning.

Hyperparameter. In ML, a parameter whose value is set prior to the learning process as opposed to other values derived by virtue of training.

Hyperplane. In ML, a decision boundary that helps classify data points from a single space into subspaces where each side of the boundary may be attributed to a different class, such as positive and negative classes. Also see support vector machine.


Inference. In ML, the process of applying a trained model to data in order to generate a model output such as a score, prediction, or classification. Also see training.

Input layer. The first layer in a neural network, acting as the beginning of a model workflow, responsible for receiving data and passing it to subsequent layers. Also see hidden layer and output layer.

Intelligent process automation (IPA). A collection of technologies, including robotic process automation (RPA) and AI, to help automate certain digital processes. Also see robotic process automation (RPA).


K-means clustering. An unsupervised learning method used to cluster n observations into k clusters such that each of the n observations belongs to the nearest of the k clusters.

K-nearest neighbors (KNN). A supervised learning method for classification and regression used to estimate the likelihood that a data point is a member of a group, where the model input is defined as the k closest training examples in a data set and the output is either a class assignment (classification) or a property value (regression).

Knowledge distillation. In ML, a technique used to transfer the knowledge of a complex model, usually a deep neural network, to a simpler model with a smaller computational cost.


Layer. In ML, a collection of neurons within a neural network which perform a specific computational function, such as an activation function, on a set of input features. Also see hidden layer, input layer, and output layer.

Logistic regression. A type of classifier that measures the relationship between one variable and one or more variables using a logistic function.

Long Short Term Memory (LSTM). A recurrent neural network (RNN) that maintains history in an internal memory state, utilizing feedback connections (as opposed to standard feedforward connections) to analyze and learn from entire sequences of data, not only individual data points.


Machine learning (ML). A subset of artificial intelligence that gives machines the ability to analyze a set of data, draw conclusions about the data, and then make predictions when presented with new data without being explicitly programmed to do so.

Mimi. The term used to refer to Entefy’s multimodal AI engine and technology.

MLOps. A set of practices to help streamline the process of managing, monitoring, deploying, and maintaining machine learning models.

Model training. The process of providing a dataset to a machine learning model for the purpose of improving the precision or effectiveness of the model. Also see supervised learning and unsupervised learning.

Multimodal AI. Machine learning models that analyze and relate data processed using multiple modes or formats of learning.


N-gram. A token, often a string, containing a contiguous sequence of n words from a given data sample.

Naive Bayes. A probabilistic classifier based on applying Bayes Rule which makes strong (naive) assumptions about the independence of features.

Named entity recognition (NER). An NLP model that locates and classifies elements in text into pre-defined categories.

Natural language processing (NLP). A field of computer science and artificial intelligence focused on processing and analyzing natural human language or text data.

Natural language understanding (NLU). A specialty area within Natural Language Processing focused on advanced analysis of text to extract meaning and context. 

Neural network (NN) (also, artificial neural network). A specific machine learning technique that is inspired by the neural connections of the human brain. The intelligence comes from the ability to analyze countless data inputs to discover context and meaning.


Ontology. A data model that represents relationships between concepts, events, entities, or other categories. In the AI context, ontologies are often used by AI systems to analyze, share, or reuse knowledge.

Outlier detection. The process of detecting a datapoint that is unusually distant from the average expected norms within a dataset. Also see anomaly detection.

Output layer. The last layer in a neural network, acting as the end of a model workflow, responsible for delivering the final result or answer such as a score, class label, or prediction. Also see hidden layer and input layer.

Overfitting. In ML, a condition where a trained model over-conforms to training data and does not perform well on new, unseen data. Also see underfitting.


Precision. In machine learning, a measure of model accuracy computing the ratio of true positives against all true and false positives in a given class.

Predictive analytics. The process of learning from historical patterns and trends in data to generate predictions, insights, recommendations, or otherwise assess the likelihood of future outcomes. Also see descriptive analytics and prescriptive analytics.

Prescriptive analytics. The process of using data to determine potential actions or strategies based on predicted future outcomes. Also see descriptive analytics and predictive analytics.

Primary feature. A feature, the value of which is present in or derived from a dataset directly. 


Random forest. An ensemble machine learning method that blends the output of multiple decision trees in order to produce improved results.

Recall. In machine learning, a measure of model accuracy computing the ratio of true positives guessed against all actual positives in a given class.

Recurrent neural network (RNN). A class of neural networks that is popularly used to analyze temporal data such as time series, video and speech data.

Regression. In AI, a mathematical technique to estimate the relationship between one variable and one or more other variables. Also see classification.

Reinforcement learning (RL). A machine learning technique where an agent learns independently the rules of a system via trial-and-error sequences.

Robotic process automation (RPA). Business process automation that uses virtual software robots (not physical) to observe the user’s low-level or monotonous tasks performed using an application’s user interface in order to automate those tasks. Also see intelligent process automation (IPA).


Self-supervised learning. Autonomous Supervised Learning, whereby a system identifies and extracts naturally-available signal from unlabeled data through processes of self-selection.

Semi-supervised learning. A machine learning technique that fits between supervised learning (in which data used for training is labeled) and unsupervised learning (in which data used for training is unlabeled).

Strong AI (also, AGI). The term used to describe artificial general intelligence or a machine’s intelligence functionality that matches human cognitive capabilities across multiple domains. Often characterized by self-improvement mechanisms and generalization rather than specific training to perform in narrow domains. Also see weak AI.

Structured data. Data that has been organized using a predetermined model, often in the form of a table with values and linked relationships. Also see unstructured data.

Supervised learning. A machine learning technique that infers from training performed on labeled data. Also see unsupervised learning.

Support vector machine (SVM). A type of supervised learning model that separates data into one of two classes using various hyperplanes. 


Taxonomy. A hierarchal structured list of terms to illustrate the relationship between those terms. Also see ontology. 

Teacher Student model. A type of machine learning model where a teacher model is used to generate labels for a student model. The student model then tries to learn from these labels and improve its performance. This type of model is often used in semi-supervised learning, where a large amount of unlabeled data is available but labeling it is expensive.

Tokenization. In ML, a method of separating a piece of text into smaller units called tokens, representing words, characters, or subwords, also known as n-grams.

Time series. A set of data structured in spaced units of time.

TinyML. A branch of machine learning that deals with creating models that can run on very limited resources, such as embedded IoT devices.

Transfer learning. A machine learning technique where the knowledge derived from solving one problem is applied to a different (typically related) problem.

Transformer. In ML, a type of deep learning model for handling sequential data, such as natural language text, without needing to process the data in sequential order.

Tuning. The process of optimizing the hyperparameters of an AI algorithm to improve its precision or effectiveness. Also see algorithm.


Underfitting. In ML, a condition where a trained model is too simple to learn the underlying structure of a more complex dataset. Also see overfitting.

Unstructured data. Data that has not been organized with a predetermined order or structure, often making it difficult for computer systems to process and analyze.

Unsupervised learning. A machine learning technique that infers from training performed on unlabeled data. Also see supervised learning.


Validation. In ML, the process by which the performance of a trained model is evaluated against a specific testing dataset which contains samples that were not included in the training dataset. Also see training.

Vectorization. The process of transforming data into vector representation using numbers.


Weak AI. The term used to describe a narrow AI built and trained for a specific task. Also see strong AI.

Weight. In ML, a learnable parameter in nodes of a neural network, representing the importance value of a given feature, where input data is transformed (through multiplication) and the resulting value is either passed to the next layer or used as the model output.

Word Embedding. In NLP, the vectorization of words and phrases, typically for the purpose of representing language in a low-dimensional space.