How Recommendation Systems Are Built Using ML

Every time you scroll through the Instagram “For You” page, browse products on Amazon, or find a new favorite show on Netflix, you are interacting with a recommendation system. These systems are the unseen creators of your digital experience; they’re built to predict what you might want or need by cutting through the clutter of all the information available to you.

But how do these systems work? They don’t have crystal balls, and they do not rely on old-fashioned algorithms, but rather, they use Machine Learning (ML) models to help make their predictions. As the field has progressed over the last three years, we have moved from simple algorithms to Large-Scale Recommender Models (LRMs) that help process billions of data points, many times each day.

This blog post will provide insight into the makeup of today’s recommendation systems by reviewing basic techniques, advanced architectural features, and evaluation methods for how these systems can help you make better decisions about the content you consume online each day.

The Foundation: Two Classic Approaches

Understanding the foundational paradigms of hybrid systems is important before starting deep learning.

1. Content-Based Filtering

A content-based filtering system employs analysis of the film’s properties (metadata) that is, Inception’s genre (sci-fi/thriller), director (Christopher Nolan), and main cast member (Leonardo DiCaprio), and searches its database for other items with those similarities so it can generate recommendations.

To understand how content-based filtering works, you build a profile for each user based on the characteristics of the films they have viewed. Since this only requires data regarding the items, it is relatively simple to set up.

However, content-based filtering presents several problems. First, it creates “filter bubbles” or “silos”. For instance, if you watch only action films, the content-based filtering system will recommend only action films for you. Second, there is no element of chance; if you want to find out if you like films from classic French cinema, that experience may never present itself to you.

2. Collaborative Filtering (CF)

Collaborative Filtering (CF) is used to recommend products to users by looking at what other people have purchased and/or rated. It is based on the premise that if they liked the item, there is a good chance they will like more of them.

User-Based CF is an approach that recommends items to a user by comparing their rating histories with those of other users. The system will recommend items to User A that User B has previously purchased and rated highly.

Item-Based CF is an approach that uses the relationships between items being purchased to make recommendations. For example, if a user purchased a lamp and a bookshelf, a user who has previously purchased the same or similar lamps as another user will receive a recommendation for bookshelf(s) with the same or similar rating as their lamp(s).

The most successful implementation of Collaborative Filtering has been the Matrix Factorisation technique, which was popularised in part by the Netflix Prize. Rather than deal with the large, sparse matrix of user-item ratings, the algorithm decomposes the matrix into two lower-dimensional matrices, one of which represents latent factors for each of the users (e.g., does a user have a preference for Action Movies or Romance Movies?) and the other of which represents latent factors for each of the items (e.g., is the item a High-Brow Movie or a Popcorn Movie?) A recommendation is made by matching the latent factors for an individual user with the latent factors for an individual item.

The biggest disadvantage of CF is the ‘Cold Start’ Problem. It cannot deal with new users or new items because there are no prior interaction histories to learn from.

The Modern Toolkit: Deep Learning and Two-Tower Models

In order to address the issues caused by conventional techniques and to account for the scale of the current platforms, the industry has transitioned to deep learning. In recent years, one of the most prominent architectures is the Two-Tower Model, also known as the dual-encoder.

The Two-Tower Model is a neural network developed by Google and has gained popularity through its use on various platforms, including Redis and the infrastructure of Instagram, for its ability to allow for inductive learning and scalability.

Rather than using one single model to do everything, it uses two distinct neural networks in parallel:

The User Tower: The user tower takes in user-related features (such as id, demographics, historical preferences, etc) and will produce a user embedding (a high-dimensional vector-based representation of the user) that describes the user in a high-dimensional space.

The Item Tower: The item tower takes in item-related information (including item id, category, price, description) and will produce an item embedding in the same dimensional space.

How does it work: This model is trained on data of how users interact with items (clicks, purchases, watches, etc.). The goal of its training is to position the user and item embeddings as close together with a high cosine similarity to each other for those items the user has interacted with, and to position them as far apart from each other for those items the user has not interacted with.

In what way does this benefit us?

Cold Start Behavior: Unlike standard collaborative filtering systems, which are transductive (a system can only predict user/item actions based on training data), Two Tower models contain Inductive behavior. After training, you only need the new user’s features to produce their embedding from the User Tower, and then to get the appropriate items is done by finding the nearest items to the user’s embedding. There is no need for the new user to have any historical interaction with any of the items before a recommendation is made.

Scalability: The two towers provide for tremendous scalability and use case options. The user embeddings can be pre-calculated, and the item embeddings can be kept in a vector database (like RedisVL or Apache Paimon), making it extremely easy to perform fast Nearest Neighbor searches on all items.

Tackling the Hard Problems: Hybrids, Sparsity, and Scale

In spite of the numerous challenges faced by existing technologies developed by Large-Scale Data (i.e., Two-Big Tower) solutions, real-world data tends to be very messy. In addition, many engineers have created additional advanced understandings as well as new techniques to refine these systems.

The hybrid approach is another way to successfully implement data-based filtering systems for recommendation purposes by applying various types of filtering methods together, since most modern filtering systems are combined hybrid systems. For example, a hybrid recommendation system may start as using content-based filtering only to provide suggestions at the time the user creates an account, but once the user has generated interaction data (e.g., movie watch history, previous rentals, ratings etc.), the system will begin to place more weight upon the recommendations made through collaborative filtering due to the addition of diversity into the recommendation process.

In 2025, new research studies continue in this manner. For example, the Journal of KI studies conducted a collaborative filtering experiment between K-Means and Neural Collaborative Filtering (Collaboration between Researchers). The researchers clustered all items using shared features such as genre or year released before running them through the neural net. Clustering the items has allowed them to add richer context to the filtering process, thus greatly increasing the accuracy of their recommendations while also reducing the effect of data sparseness. Several additional studies have reported similar findings by merging the NLP technique with Enhanced SVD (Collaborative Filtering Via Language Processing) to analyze item content in more detail, resulting in significant accuracy gains on many popular datasets such as Netflix and IMDb.

How Do We Know It’s Working? Evaluation Metrics

Once you’ve completed your model, you need to figure out how well your model is performing. The different aspects of a recommendation system can be evaluated using more than just one measure, and to evaluate the performance of your recommendation system using historical data.

Such as:

Offline Metrics (Evaluating Historical Data)

Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) – These two metrics help you measure how well your recommendation system performs because they help you understand how far off the prediction was based on how the user rated the movie. (for example, if the recommendation system said the movie would be rated 5 stars, but the user rated it 3 stars, then your system’s RMSE and MAE would quantify that number.)

Precision @K and Recall @K – In your list of 10 items that you have recommended to a user (K=10), Precision tells you how many of the recommended items were actually relevant to the user. Recall tells you how many of the recommended items were relevant to the user from the total list of all items that could be relevant to the user.

Normalized Discounted Cumulative Gain (NDCG) – This metric is concerned with where the recommended item appears in the recommended list. The closer the recommended item is to the top of the list, the more important the item is to the user. This is in consideration that users rarely look past the first 2-3 items in a recommendation list.

Online Metrics (Testing in the Real World)

A/B Testing: This method allows you to separate users into two categories (Control = Old and Treatment = New) so there is something tangible to measure when determining if the new model will provide more favorable business results.

CTR (Click Through Rate): Of the items recommended to the user, how many were selected by the user?

Engagement Time: For platforms such as Netflix and Spotify, simply clicking an item is not sufficient evidence that the user will engage with that content. For example, did the user watch the entire movie that was clicked on for 2 hours, or did they stop watching after 5 minutes?

The Future of Recommendation Systems

In front of us, we’ll be seeing a fast new evolution in the field of engineering. Due to a new emerging trend of leveraging Large Language Models (LLMs) to join together and create conversational systems that use dialogue (i.e a person-to-person conversation), to refine their recommendations. In addition to this, there is also a strong push toward developing systems that explain their recommendations through Explainable AI (XAI). In other words, these systems will provide reasoning as to why a specific book or video was recommended, thus enhancing levels of trust and transparency.

Recommendation systems have certainly captured my attention because they provide us with both the simplest forms (e.g. “other customers who purchased this item also purchased…”) and most complex forms (e.g. through the use of advanced neural networks leveraging a single data lake) of ML applications, while at the same time acting as silent matchmakers that are always learning and adapting to help us identify content that we would not have otherwise discovered on our own.

How are recommendation systems built using machine learning?

Leave a Comment Cancel Reply