Machine learning can feel like a black box — but the core algorithms are surprisingly intuitive once you see them in context. This article walks through the most important ML algorithms, explains how each one works, when to use it, and shows the key formulas and visual intuitions behind them. No PhD required.
1. Linear regression
Linear regression is the simplest and most widely used supervised learning algorithm. It finds a straight line (or a flat surface in higher dimensions) that best fits the data — minimising the distance between the predicted values and the actual values.
When to use it: Predicting a continuous value — house prices, revenue forecasts, project durations, employee attrition cost. It's fast, interpretable, and often a solid baseline even when more complex models are available.
The model adjusts β values to minimise this cost — the average squared difference between actual and predicted values.
2. Logistic regression
Despite the name, logistic regression is used for classification, not regression. It predicts the probability that an input belongs to a particular class (e.g. "spam" vs "not spam"). The key trick is the sigmoid function, which squashes any number into a value between 0 and 1 — a probability.
When to use it: Binary yes/no predictions — will a client churn? Is this transaction fraudulent? Should this email be flagged? Also extends to multi-class problems with softmax regression.
3. Decision trees
A decision tree splits data into branches based on feature values, creating a flowchart-like structure. At each node, the algorithm picks the feature and threshold that best separates the data. The result is highly interpretable — you can literally follow the tree to see why a prediction was made.
The tree picks the split that produces the lowest weighted Gini impurity in the child nodes. A Gini of 0 means perfectly pure (all one class).
When to use it: When interpretability matters — regulated industries, audit trails, client-facing explanations. Also the building block for more powerful ensemble methods.
4. Random forests
A random forest is an ensemble of many decision trees, each trained on a random subset of the data and features. For a prediction, every tree votes, and the majority wins (classification) or the average is taken (regression). This reduces overfitting dramatically compared to a single tree.
When to use it: General-purpose classification and regression when you want high accuracy with moderate interpretability. Excellent for tabular business data — customer behaviour, risk scoring, demand forecasting.
5. K-means clustering
K-means is an unsupervised algorithm — it groups unlabelled data into k clusters based on similarity. The algorithm iteratively assigns each data point to the nearest cluster centre (centroid) and then recalculates the centroids until convergence.
Minimise the total within-cluster variance — the sum of squared distances from each point to its assigned centroid μₖ.
When to use it: Customer segmentation, document grouping, image compression, and any problem where you want to discover natural groupings in unlabelled data.
6. Gradient boosting (XGBoost, LightGBM)
Gradient boosting builds an ensemble of weak decision trees sequentially — each new tree focuses on the errors the previous trees got wrong. The result is often the most accurate model for structured/tabular data, and it's the algorithm behind most Kaggle competition winners.
When to use it: When accuracy on tabular data is paramount — fraud detection, credit scoring, pricing optimisation, demand forecasting. Implementations like XGBoost and LightGBM are fast and battle-tested.
7. K-nearest neighbours (KNN)
KNN is one of the simplest algorithms: to classify a new data point, look at the k closest points in the training data and take a majority vote. No actual "training" happens — the model is just the data itself.
When to use it: Quick prototyping, recommendation systems ("users who are similar to you also liked…"), and small-to-medium datasets where simplicity is valued.
8. Neural networks (deep learning)
Neural networks are loosely inspired by the brain — layers of interconnected nodes ("neurons") that transform input data through weighted sums and non-linear activation functions. Deep learning simply means neural networks with many layers, enabling them to learn complex hierarchical representations.
When to use it: Image recognition, natural language processing, speech recognition, and any problem with very large datasets and complex, non-linear patterns. Neural networks are the foundation of generative AI.
Algorithm cheat sheet
Where to go from here
These eight algorithms cover the vast majority of real-world machine learning problems. For most business applications, you'll use linear/logistic regression for baselines, gradient boosting or random forests for production accuracy, k-means for segmentation, and neural networks for unstructured data (text, images, audio).
The key is not to memorise formulas — it's to understand which algorithm fits which problem, and to build intuition by working with real data. Start with the simplest model that could work, measure it rigorously, and only add complexity when it measurably improves results.
If you'd like help identifying the right ML approach for your use case, we'd be happy to talk. Get in touch to start the conversation.