Predicting Likes: Inside A Simple Recommendation Engine’s Algorithms

The Internet is becoming “smarter” every day. The video-sharing website that you frequently visit seems to know exactly what you will like, even before you have seen it. The online shopping cart holding your items almost magically figures out the one thing that you may have missed or intended to add before checking out. It’s as if these web services are reading your mind—or are they?

Turns out, predicting a user’s likes involves more math than magic. In this article we explore one of the many ways of building a recommendation engine that is both simple to implement and understand.

Predicting Likes: Inside A Simple Recommendation Engine’s Algorithms

By Mahmud Ridwan

Mahmud is a software developer with many years of experience and a knack for efficiency, scalability, and stable solutions.

Years of Experience

A recommendation engine (sometimes referred to as a recommender system) is a tool that lets algorithm developers predict what a user may or may not like among a list of given items. Recommendation engines are a pretty interesting alternative to search fields, as recommendation engines help users discover products or content that they may not come across otherwise. This makes recommendation engines a great part of web sites and services such as Facebook, YouTube, Amazon, and more.

Recommendation engines work ideally in one of two ways. It can rely on the properties of the items that a user likes, which are analyzed to determine what else the user may like; or, it can rely on the likes and dislikes of other users, which the recommendation engine then uses to compute a similarity index between users and recommend items to them accordingly. It is also possible to combine both these methods to build a much more robust recommendation engine. However, like all other information related problems, it is essential to pick an algorithm that is suitable for the problem being addressed.

Building a Recommendation Engine

In this tutorial, we will walk you through the process of building a recommendation engine that is collaborative and memory-based. This recommendation engine will recommend movies to users based on what they like and dislike, and will function like the second example that was mentioned before. For this project, we will be using basic set operations, a little mathematics, and Node.js/CoffeeScript. All source code relevant to this tutorial can be found here .

Sets and Equations

Before implementing a collaborative memory-based recommendation engine, we must first understand the core idea behind such a system. To this engine, each item and each user is nothing but identifiers. Therefore, we will not take any other attribute of a movie (for example, the cast, director, genre, etc.) into consideration while generating recommendations. The similarity between two users is represented using a decimal number between -1.0 and 1.0. We will call this number the similarity index. Finally, the possibility of a user liking a movie will be represented using another decimal number between -1.0 and 1.0. Now that we have modelled the world around this system using simple terms, we can unleash a handful of elegant mathematical equations to define the relationship between these identifiers and numbers.

In our recommendation algorithm, we will maintain a number of sets. Each user will have two sets: a set of movies the user likes, and a set of movies the user dislikes. Each movie will also have two sets associated with it: a set of users who liked the movie, and a set of users who disliked the movie. During the stages where recommendations are generated, a number of sets will be produced - mostly unions or intersections of the other sets. We will also have ordered lists of suggestions and similar users for each user.

To calculate the similarity index, we will use a variation of the Jaccard index formula. Originally known as “coefficient de communauté” (coined by Paul Jaccard), the formula compares two sets and produces a simple decimal statistic between 0 and 1.0:

similarity index

The formula involves the division of the number of common elements in either set by the number of all the elements (counted only once) in both sets. The Jaccard index of two identical sets will always be 1, while the Jaccard index of two sets with no common elements will always yield 0. Now that we know how to compare two sets, let us think of a strategy we can use to compare two users. As discussed earlier, the users, from the system’s point of view, are three things: an identifier, a set of liked movies, and a set of disliked movies. If we were to define our users’ similarity index based only on the set of their liked movies, we could directly use the Jaccard index formula:

jaccard index formula

Here, U1 and U2 are the two users we are comparing, and L1 and L2 are the sets of movies that U1 and U2 have liked, respectively. Now, if you think about it, two users liking the same movies are similar, then two users disliking the same movies should also be similar. This is where we modify the equation a little:

modified equasion

Instead of just considering the common likes in the formula’s numerator, we now add the number of common dislikes as well. In the denominator, we take the number of all the items that either user has liked or disliked. Now that we have considered both likes and dislikes in an independent sort of way, we should also think about the case where two users are polar opposites in their preferences. The similarity index of two users where one likes a movie and the other dislikes it shouldn’t be 0:

similarity index of two users

That’s one long formula! But it’s simple, I promise. It’s similar to our previous formula with a small difference in the numerator. We are now subtracting the number of conflicting likes and dislikes of the two users from the number of their common likes and dislikes. This causes the similarity index formula to have a range of values between -1.0 and 1.0. Two users having identical tastes will have a similarity index of 1.0 while two users having entirely conflicting tastes in movies will have a similarity index of -1.0.

Now that we know how to compare two users based on their taste in movies, we have to explore one more formula before we can start implementing our homebrewed recommendation engine algorithm:

recommendation engine algorithm

Let’s break this equation down a little. What we mean by P(U,M) is the possibility of a user U liking the movie M . ZL and ZD are the sum of the similarity indices of user U with all the users who have liked or disliked the movie M , respectively. |ML|+|MD| represents the total number of users who have liked or disliked the movie M . The result P(U,M) produces a number between -1.0 and 1.0.

That’s about it. In the next section, we can use these formulae to start implementing our collaborative memory-based recommendation engine.

Building the Recommendation Engine

We will build this recommendation engine as a very simple Node.js application. There will also be very little work on the front-end, mostly some HTML pages and forms (we will use Bootstrap to make the pages look neat). On the server side, we will use CoffeeScript. The application will have a few GET and POST routes. Even though we will have the notion of users in the application, we will not have any elaborate registration/login mechanism. For persistency, we will use the Bourne package available via NPM which enables an application to store data in plain JSON files, and perform basic database queries on them. We will use Express.js to ease the process of managing the routes and handlers.

At this point, if you are new to Node.js development , you might want to clone the GitHub repository so that it’s easier to follow this tutorial. As with any other Node.js project, we will begin by creating a package.json file and installing a set of dependency packages required for this project. If you are using the cloned repository, the package.json file should already be there, from where installing the dependencies will require you to execute “$ npm install”. This will install all the packages listed inside the package.json file.

The Node.js packages we need for this project are:

  • coffee-script

We will build the recommendation engine by splitting all relevant methods into four separate CoffeeScript classes, each of which will stored under “lib/engine”: Engine, Rater, Similars, and Suggestions. The class Engine will be responsible for providing a simple API for the recommendation engine, and will bind the other three classes together. Rater will be responsible for tracking likes and dislikes (as two separate instances of the Rater class). Similars and Suggestions will be responsible for determining and tracking similar users and recommended items for the users, respectively.

Tracking Likes and Dislikes

Let us first begin with our Raters class. This is a simple one:

As indicated earlier in this tutorial, we will have one instance of Rater for likes, and another one for dislikes. To record that a user likes an item, we will pass them to “Rater#add()”. Similarly, to remove the rating, we will pass them to “Rater#remove()”.

Since we are using Bourne as a server-less database solution, we will store these ratings in a file named “./db-#{@kind}.json”, where kind is either “likes” or “dislikes”. We will open the database inside the constructor of the Rater instance:

This will make adding rating records as simple as calling a Bourne database method inside our “Rater#add()” method:

And it is similar to remove them (“db.delete” instead of “db.insert”). However, before we either add or remove something, we must ensure it doesn’t already exist in the database. Ideally, with a real database, we could have done it as a single operation. With Bourne, we have to do a manual check first; and, once the insertion or deletion is done, we need to make sure we recalculate the similarity indices for this user, and then generate a set of new suggestions. The “Rater#add()” and “Rater#remove()” methods will look something like this:

For brevity, we will skip the parts where we check for errors. This might be a reasonable thing to do in an article, but is not an excuse for ignoring errors in real code.

The other two methods, “Rater#itemsByUser()” and “Rater#usersByItem()” of this class will involve doing what their names imply - looking up items rated by a user and users who have rated an item, respectively. For example, when Rater is instantiated with kind = “likes” , “Rater#itemsByUser()” will find all the items the user has rated.

Finding Similar Users

Moving on to our next class: Similars. This class will help us compute and keep track of the similarity indices between the users. As discussed before, calculating the similarity between two users involves analyzing the sets of items they like and dislike. To do that, we will rely on the Rater instances to fetch the sets of relevant items, and then determine the similarity index for certain pairs of users using the similarity index formula.

Finding Similar Users

Just like our previous class, Rater, we will put everything in a Bourne database named “./db-similars.json”, which we will open in the constructor of Rater. The class will have a method “Similars#byUser()”, which will let us look up users similar to a given user through a simple database lookup:

However, the most important method of this class is “Similars#update()” which works by taking a user and computing a list of other users who are similar, and storing the list in the database, along with their similarity indices. It starts by finding the user’s likes and dislikes:

We also find all the users who have rated these items:

Next, for each of these other users, we compute the similarity index and store it all in the database:

Within the snippet above, you will notice that we have an expression identical in nature to our similarity index formula, a variant of the Jaccard index formula.

Generating Recommendations

Our next class, Suggestions, is where all the predictions take place. Like the class Similars, we rely on another Bourne database named “./db-suggestions.json”, opened inside the constructor.

Generating Recommendations and suggestions

The class will have a method “Suggestions#forUser()” to lookup computed suggestions for the given user:

The method that will compute these results is “Suggestions#update()”. This method, like “Similars#update()”, will take a user as an argument. The method begins by listing all the users similar to the given user, and all the items the given user has not rated:

Once we have all the other users and the unrated items listed, we can begin computing a new set of recommendations by removing any previous set of recommendations, iterating over each item, and computing the possibility of the user liking it based on available information:

Once that is done, we save it back to the database:

Exposing the Library API

Inside the Engine class, we bind everything up in a neat API-like structure for easy access from the outside world:

Once we instantiate an Engine object:

We can easily add or remove likes and dislikes:

We can also begin updating user similarity indices and suggestions:

Finally, it is important to export this Engine class (and all the other classes) from their respective “.coffee” files:

Then, export the Engine from the package by creating an “” file with a single line:

Creating the User Interface

To be able to use the recommendation engine algorithm in this tutorial, we want to provide a simple user interface over the web. To do that, we spawn an Express app inside our “web.iced” file and handle a few routes:

Within the app, we handle four routes. The index route “/” is where we serve the front-end HTML by rendering a Jade template. Generating the template requires a list of movies, the current user’s username, the user’s likes and dislikes, and the top four suggestions for the user. The source code of the Jade template is left out of the article, but it is available in the GitHub repository .

The “/like” and “/dislike” routes are where we accept POST requests to record the user’s likes and dislikes. Both routes add a rating by first removing any conflicting rating, if necessary. For example, a user liking something they previously disliked will cause the handler to remove the “dislike” rating first. These routes also allow the user to “unlike” or “un-dislike” an item, if desired.

Finally, the “/refresh” route allows the user to regenerate their set of recommendations on demand. Although, this action is automatically performed whenever the user makes any rating to an item.

If you have attempted to implement this application from scratch by following this article, you will need to perform one last step before you can test it. You will need to create a “.json” file at “data/movies.json”, and populate it with some movie data like so:

You may want copy the one available in the GitHub repository , which is pre-populated with a handful of movie names and thumbnail URLs.

Once all the source code is ready and wired together, starting the server process requires the following command to be invoked:

Assuming everything went smoothly, you should see the following text appear on the terminal:

Since we have not implemented any true user authentication system, the prototype application relies on only a username picked after visiting “http://localhost:5000”. Once a username has been entered, and the form is submitted, you should be taken to another page with two sections: “Recommended Movies” and “All Movies”. Since we lack the most important element of a collaborative memory-based recommendation engine (data), we will not be able to recommend any movies to this new user.

At this point, you should open another browser window to “http://localhost:5000” and login as a different user there. Like and dislike some movies as this second user. Return to the browser window of the first user and rate some movies as well. Make sure you rate at least a couple of common movies for both users. You should start seeing recommendations immediately.


In this algorithm tutorial, what we have built is a prototype recommendation engine. There are certainly ways to improve upon this engine. This section will briefly touch on some areas where improvements are essential for this to be used at a large scale. However, in cases where scalability, stability, and other such properties are required, you should always resort to using a good time-tested solution. Like the rest of the article, the idea here is to provide some insight into how a recommendation engine works. Instead of discussing the obvious flaws of the current method (such as race condition in some of the methods we have implemented), improvements will be discussed at a higher level.

One very obvious improvement here is to use a real database, instead of our file-based solution. The file-based solution may work fine in a prototype at a small scale, but it’s not a reasonable choice at all for real use. One option among many is Redis. Redis is fast, and has special capabilities that are useful when dealing with set-like data structures.

Another issue that we can simply work around is the fact that we are calculating new recommendations every time a user makes or changes their ratings for movies. Instead of doing recalculations on-the-fly in real time, we should queue these recommendation update requests for the users and perform them behind the scene - perhaps setting a timed refresh interval.

Besides these “technical” choices, there are also some strategic choices that can be made to improve the recommendations. As the number of items and users grow, it will become increasingly costly (in terms of time and system resources) to generate recommendations. It is possible to make this faster by choosing only a subset of users to generate recommendations from, instead of processing the entire database every time. For example, if this was a recommendation engine for restaurants, you could limit the similar user set to contain only those users that live in the same city or state.

Other improvements may involve taking a hybrid approach, where recommendations are generated based on both collaborative filtering and content-based filtering. This would be especially good with content such as movies, where the properties of the content is well defined. Netflix, for example, takes this route, recommending movies based on both other users’ activities and the movies’ attributes.

Memory-based collaborative recommendation engine algorithms can be a pretty powerful thing. The one we experimented with in this article may be primitive, but it’s also simple: simple to understand, and simple to build. It may be far from perfect, but robust implementations of recommendation engines, such as Recommendable, are built on similar fundamental ideas.

Like most other computer science problems that involve lots of data, getting correct recommendations is a lot about choosing the right algorithm and appropriate attributes of the content to work on. I hope this article has given you a glimpse into what happens inside a collaborative memory-based recommendation engine when you are using it.

Further Reading on the Toptal Blog:

  • The Best UX Designer Portfolios: Inspiring Case Studies and Examples
  • Digging Deeper: A Practical Guide to Creative Empathy for Product Design
  • A Machine Learning Tutorial With Examples: An Introduction to ML Theory and Its Applications
  • Strategic Listening: A Guide to Python Social Media Analysis
  • Getting Started With the SRVB Cryptosystem
  • RecommendationEngine
  • CollaborativeFiltering

Mahmud Ridwan's profile image

Located in Dhaka, Dhaka Division, Bangladesh

Member since January 16, 2014

About the author

Wordpress-powered angular: jwt authentication using graphql.

World-class articles, delivered weekly.

Subscription implies consent to our privacy policy

Toptal Developers

  • Algorithm Developers
  • Angular Developers
  • AWS Developers
  • Azure Developers
  • Big Data Architects
  • Blockchain Developers
  • Business Intelligence Developers
  • C Developers
  • Computer Vision Developers
  • Django Developers
  • Docker Developers
  • Elixir Developers
  • Go Engineers
  • GraphQL Developers
  • Jenkins Developers
  • Kotlin Developers
  • Kubernetes Experts
  • Machine Learning Engineers
  • Magento Developers
  • .NET Developers
  • R Developers
  • React Native Developers
  • Ruby on Rails Developers
  • Salesforce Developers
  • SQL Developers
  • Tableau Developers
  • Unreal Engine Developers
  • Xamarin Developers
  • View More Freelance Developers

Join the Toptal ® community.

Recommendation Engine: What It Is, How It Works

how to create recommendation engine

A recommendation engine, or recommender system , is a data filtering tool that provides personalized suggestions to users based on their past behavior and preferences. Using machine learning algorithms and statistical analysis , it can predict a person’s wants and needs based on the data they generate, as well as suggest products, content or information they’re likely to find interesting or relevant. 

“The goal,” according to Patrick Thompson, director of product at recommendation engine provider Amplitude , “is to get to the point where you’re recommending the right content to the right person at the right time, based off of their previous journey.”

What Is a Recommendation Engine?

A recommendation engine is a tool that uses machine learning to detect patterns in a person’s behavioral data (such as browsing history and past purchases) to suggest specific content, products or information they’re likely to find interesting or relevant.

Recommendation engines are just about everywhere, from video streaming services to e-commerce sites. Some familiar examples include Netflix, which suggests shows and movies a user might like based on their watch history, and Google, which uses a person’s browsing history to rank information and predict what they may search for next. 

In a world of information overload, recommendation engines make it easy for consumers to discover products and content they want — and for companies to create personalized experiences that keep those consumers coming back.

Related Reading For Users, Better AI Means More Personalization

How Do Recommendation Engines Work?

Put simply, recommendation engines bring together lots of data and then use machine learning to recommend the “next best action,” Thompson said, and that could be anything from buying a product to clicking on a video.

There are two main categories at play in a recommendation engine — users and items, according to Eugene Medved, an AI developer at recommendation engine provider InData Labs. “The task itself,” he explained, “is all about ranking the items for a specific user by probability of the interaction.”

This is accomplished by a standard order of operations, starting with data gathering.

1. Data Gathering

Data is crucial to how recommendation engines work. Information about a person’s browsing habits, purchase history — and even more personal details like their gender and age — form the building blocks from which patterns are extracted. The more data a recommendation engine has access to, the more effective it will be in making relevant suggestions.

This data typically comes in two forms. One is implicit data, which refers to information about a user’s search history, clicks, purchases and other activities; it’s gathered by a company every time a person uses their site. The other is explicit data, which covers the user’s inputs, such as previous ratings, reviews or comments. (Recommendation engines also use data regarding a person’s age, gender and general interests to identify similar customers.) 

Gathering all of this customer data is essential to building a recommendation engine.

2. Data Storage

Once that customer data is gathered, it has to be stored . How and where it’s stored depends on the kind of data that’s been gathered.

In addition to data about the users, companies also store data about the items they provide, whether that be shoes or television shows. This can be anything from price to genre to item type, all of which is used to help determine product similarities and user preferences.

3. Data Analysis

Then, a machine learning system is placed on top of that data, drilling down into it and analyzing it.

Recommendation engines use all kinds of algorithms to analyze data, but the most common one is singular value decomposition, or SVD. This is a mathematical technique that breaks down a matrix into three smaller matrices in an effort to detect patterns and relationships in the data, as well as determine the strength of those patterns and relationships. The goal is to better understand the underlying structure of a large data set so that meaningful information can be extracted.

4. Data Filtering

The final step is filtering the data. This involves rearranging and sectioning off data that will be most relevant to show users based on data analysis. Different mathematical rules and formulas are applied to the data to filter it, and will vary depending on the type of recommendation engine used. 

5. Presenting The recommendation

Suggestions made by a recommendation engine can be presented to a user in a variety of ways. They can be served as a message directly on the site (“products similar to this,” or “users also liked,” for example), as a targeted advertisement that comes later on social media or another website, or as part of a personalized marketing message, like an email.

Recommendation engines often sync their recommendation data across all different devices, helping to ensure that users receive consistent and personalized suggestions, regardless of whether they’re using a TV, mobile app or personal computer. Recommendation engines also continuously learn from user interactions and feedback through adaptive learning, refining their recommendations to better align with individual preferences as they evolve.

Learn More What Makes a Music Recommendation Engine Good?

Types of Recommendation Engines

There are three types of recommendation engines: collaborative filtering, content-based filtering and hybrid filtering. Each type determines how data filtering will be carried out in the recommendation process.

Collaborative Filtering 

Collaborative filtering collects and analyzes data on user activities, behavior and preferences in order to predict what a person will like based on their similarity to other users. An advantage to this approach is that it doesn’t require the system to understand the content or products at hand, only the users. But collaborative filtering only works well if it is supported by lots of data on lots of different users.

Content-Based Filtering 

Content-based filtering is based on the metadata collected from a single person’s actions and preferences. To make recommendations this way, algorithms create a profile of an individual user, cross reference that with a description of the item or content at hand (genre, product type, and so on) and figure out whether that item or content should be recommended to that individual. While good at creating personalized suggestions, this kind of recommendation engine is limited to whatever information a person has provided in the past. 

Hybrid Filtering

Hybrid filtering is a combination of collaborative filtering and content-based filtering, and is designed to improve the accuracy and relevance of their recommendations. 

How Recommendation Engines Are Used

Recommendation engines are used across a variety of industries, and have become a popular means of improving both customer experience and a company’s bottom line.

In e-commerce, recommendation engines play a crucial role in driving sales. About 35 percent of purchases on Amazon come from product recommendations, according to a McKinsey & Company report . These days, messages like “you may also like this” and “buy this product again” are a familiar site on just about every online retail site. 

Recommendation engines are also used to identify products that are frequently bought together by customers and present them as bundled or related items. For example, if a shopper is searching for dumbbells, the recommendation engine may suggest compatible accessories like yoga mats and resistance bands. 

Recommendations based on things like location, season, price point and similar users are also common tactics in e-commerce, and are used as a way to incentivize customers to keep shopping.

Social Media

Social media platforms like Facebook and Instagram use recommendation engines to suggest friends or groups based on a user’s existing network, interests and location. They also use them to show relevant posts and advertisements, depending on a user’s preferences.

For example, YouTube considers a viewer’s watch history and ratings to suggest new videos. And TikTok considers videos the user has interacted with in the past, accounts and hashtags they’ve followed, the type of content they create, and their location and language preferences to determine what videos to show on their For You page.

Media Streaming

When a user browses movies and TV shows on a streaming platform like Netflix, Hulu or Max, the recommendation engine analyzes their viewing history, searches and previous ratings to suggest content they’re likely to watch and enjoy. Once a user finishes watching that content, the recommendation engine suggests the next title to watch. All of this is a useful way of keeping users engaged and reducing the time they spend searching for content.

Gaming platforms, like Steam and Playstation Store, and music streaming services, like Spotify and SoundCloud, also use recommendation engines to suggest relevant content based on a user’s preferences and historical data.

Benefits of Recommendation Engines

Recommendation engines can be beneficial both to the companies that deploy them and the users that encounter them. 

Improves Customer Experience

A more personalized experience can lead to more satisfied, engaged and loyal customers, mainly because they are being fed the content or products they want without having to put in the effort of finding it themselves.

After all, a lack of a recommendation engine creates a “pretty subpar experience” for customers, as Amplitude’s Thompson put it. Without it, our social media feeds would be full of content we don’t care about. And we’d have to search for every product, movie, show and song ourselves, which would be a pretty time-consuming undertaking

Increases Time Spent on Platform

Social media platforms, media streaming services and even news outlets all want people to spend as much time as possible on their sites. Consistently providing relevant recommendations of more videos to watch, songs to listen to and articles to read keeps users hooked. 

This translates to more click-through rates, conversions and — as is often the case with websites — more dollars.

Boosts Revenue

Perhaps the biggest benefit of recommendation engines — on the business side, at least — is that they can help platforms make more money. Not only do recommendation engines incentivize people to make more purchases (a technique known as cross-selling ), but they can also suggest product alternatives and draw attention to items that have been abandoned in a customer’s online shopping cart.

Even if a company isn’t in the business of selling physical products, per se, recommendation engines can still do wonders for their bottom line. For example, if Netflix’s recommendation engine consistently feeds viewers content they enjoy watching, they’re less likely to cancel their subscription or choose another streaming service, saving Netflix about $1 billion a year , according to the company.

“If you’re an organization that’s looking to increase revenue, being able to provide tailored experiences for your customers based off their likelihood to purchase or likelihood to complete a particular action, drives growth for your business,” Thompson said.

Hot Take Rage Against the Machine Learning: My War With Recommendation Engines

Challenges of Recommendation Engines

Recommendation engines do come with some challenges, though.

Limited to What They Already Know

A recommendation engine is only as good as the data it’s fed. If it doesn’t have accurate or abundant information about users or items, it likely won’t work correctly.

“They’re limited in their knowledge,” Alexander Marmuzevich, founder and CTO of InData Labs, told Built In. “They can’t propose something which doesn’t exist, they can’t generate completely new ideas.”

A common example of this is what Alexei Tishurov, a lead data scientist at InData Labs, calls a “cold start problem.” This is when a recommendation engine struggles to deal with new users who have not yet provided enough data for the engine to make accurate recommendations. New items with little or no historical data tied to them can be challenging for the engine as well.

“You need to have users interacting with items to do collaborative filtering,” Tishurov explained. “But if you have a completely new service you do not have such history.”

Can Be Biased

Like any machine learning system , recommendation engines can produce biased results if they are based on biased data. This can result in inaccurate or even discriminatory recommendations, posing both functional and ethical problems.  

By extension, recommendation engines may fall victim to popularity bias, where popular items tend to be suggested more frequently than lesser-known items. This can lead to a lack of diversity in the recommendations, and prevent users from discovering niche or less popular items.

Gathering Customer Data Can Be Tricky 

Data is the backbone of recommendation engines. But as regulations and policies regarding the collection and storage of data continue to evolve, acquiring enough accurate customer data to generate decent recommendations will be an ongoing challenge.

Companies have to be sure they’re compliant with whatever security and privacy regulations exist within the jurisdictions they’re operating out of. And even then, customers can often opt out of providing the data recommendation engines need. 

“If a customer is not giving you permission to track them or track their behavior while they’re browsing your website, it’s a lot harder for you to provide those tailored experiences,” Thompson said. Sites like Netflix and Amazon “can’t operate without being able to use the models to provide tailored recommendations,” he continued. “It’s a core, business critical system when it comes to providing their service.”

Take a Deeper Dive Online Privacy: A Guide to How Your Personal Data Is Used

Frequently Asked Questions

What is an example of a recommendation engine.

Amazon, Netflix and YouTube are well-known examples of recommendation engines. These sites gather data about users’ search history, behavior and reviews to suggest things they might want to buy (or watch) next.

What is the most popular recommendation algorithm?

Singular value decomposition (SVD) is the most common algorithm used in recommendation engines. SVD is a mathematical technique that detects patterns and relationships in the data, and determines their strength, in order to extract meaningful information.

how to create recommendation engine

Great Companies Need Great People. That's Where We Come In.


  • Analytics / BI
  • Data Strategy & Engineering
  • Custom Development
  • Applied Data Science
  • Generative AI/LLM
  • Financial Services
  • Life Sciences
  • Security/Intelligence
  • >> Other Industries
  • What is Databricks?
  • Databricks Architecture
  • Databricks Unity Catalog
  • Databricks Dolly LLM
  • Databricks vs Snowflake
  • What is Neo4j?
  • Neo4j Consulting
  • Neo4j Pricing
  • Neo4j Use Cases
  • Neo4j Resources
  • Neo4j Demo Video
  • What is Hume (Graphaware)?
  • GraphAware Hume Consulting
  • GraphAware Hume Pricing
  • GraphAware Hume Resources
  • Hume Neo4j Demo Video
  • What is Domo Analytics?
  • Domo Consulting
  • Domo Pricing
  • Domo Resources
  • Domo Demo Video
  • Domopalooza
  • >> Other Techs
  • Case Studies
  • Graph Database Use Cases
  • Tech Resources

More results...

Building Recommendation Engines (3-Step Guide)

Sarah Evans / Analytics Practice Manager

By Sarah Evans / Analytics Practice Manager

June 4, 2021

Reading Time: 6 minutes

While the name “recommendation engine” might seem to imply it all, there is so much potential in this technology approach that it’s worth digging deeper. As the importance of engaging customers and other stakeholders grows – as the reality of remote engagement itself continues to grow – building recommendation engines and employing their many different uses are rapidly becoming a fundamental capability for organizations.

Possibly the most common recommendation engine experience we all have is when streaming video services suggest new content for us to watch. Not surprisingly, those companies focus on building recommendation engines to find and suggest this new content to us, based on all our previously watched content. While streaming content recommendations, and perhaps even product recommendations from our favorite e-commerce websites are easy to imagine utilizing this technology, we should ask how else recommendation engines can benefit organizations.

There are any number of additional use cases, but many question arise in considering them: Is the ROI sufficient to justify the time and money to implement them? And how would one get started, given that the best approaches and technologies are relatively new, and true expertise hard to find? Many organizations believe that recommendation engines must be so complex, timely, and costly,  and this can often cause them to shy away from exploring the significant potential value.

In this article we demystify recommendation engines, covering what they are, what organizations can gain from them, and even provide an example approach to start building one. The article itself is geared towards organizations that are newer to graph databases and to graph data science, or that may not have the internal technical resources to explore this on their own.

What’s the Value of Building Recommendation Engines?

What is a recommendation engine , and what’s the value of building recommendation engines? They allow us to tap into and traverse the breadth of valuable data across an organization- whether it be structured and/or unstructured data- in order to take people, content, products (as a few examples, but really any other entities in the data) and to connect them in ways that are only possible programmatically, and only possible at scale with graph databases (e.g. Neo4j ) and graph data science technology. By connecting potentially valuable “entities” in your data to other potentially impacted/interested entities, there is an opportunity to create net-new value through the those new connections.

Obvious examples as mentioned above include e-commerce recommendations – where recommending the right product to the right person at the right time in order to create enough value in the buyer’s mind to purchase and/or add on to a purchase can be the difference in them staying or changing to a new vendor; or with streaming services, mining the content in ways that surface new recommended content to keeps viewers from jumping to the many other streaming options out there. It starts to become very clear that the effectiveness of the recommendation technology to produce only the most relevant recommendations is core to producing value.

Beyond the most common use cases mentioned above, building recommendation engines can help in so many other ways- really only constrained by the needs and imagination of each organization. Some examples of other valuable use cases include:

  • Looking across medical research, patient profiles, clinical trials and more to help doctors recommend optimal treatment plans
  • Using customer purchase history data to recommend even quantity of products to purchase,
  • Leveraging customer user profiles and internal store data to recommend new more convenient store locations,
  • Using social media data to recommend staffing and capacity needs for upcoming events,
  • Using event and employee data to recommend relevant events/programs to employees,
  • Using internal structured and unstructured data to surface (recommend) an organization’s top X employees to focus on for retention efforts,
  • Using internal and external (e.g. legislation, news) structured and unstructured data to recommend the top X unknowns or possible risk areas for a larger company to consider,
  • Finding and recommending de-duplication opportunities across compliance requirements.

The list really could be endless based on the unique internal and external needs and/or opportunity areas of an organization. Many are calling data “the new gold” for businesses. Those that can imagine beyond the traditional recommendation use cases will set themselves up to compete much more effectively by mining their data for value in this unique way.

Guide for Building Recommendation Engines

To begin building recommendation engines, you need interconnected data in a graph database such as Neo4j. Though not required, storing your data in this way enables you to traverse your data much more efficiently particularly at scale, enabling lightning fast insights by avoiding the expensive joins that would be required in a relational database. For more information on the benefits of a graph database, read this post on what is a graph database or learn more about: What is a knowledge graph ?

When building recommendation engines for the first time, one approach is to follow a simple three-step question approach that breaks down the process, using the collaborative filtering approach mentioned above. Once you have your interconnected data loaded into a graph database, such as customer purchase history data, it is important to start this approach out with with a single user.

Starting with a single user enables you to filter the data so you can easily see the results without extra noise. Formulate a question that can be answered with one relationship, or one hop, from your starting user. The first question could be “what products has this user purchased in the last 6 months?”. With that question and a single user, query the database and pull all answers to that question, as shown in the image below.

Step 1: Building Recommendation Engines

Next, build on the query by asking a question that can be answered in two relationships, or two hops, from your starting user. The question might be, “what users have also purchased the products that our starting user purchased?”. The results give you a number of users who are connected to your starting user through a similar product that they purchased.

Step 2: Building Recommendation Engines

Lastly, query the database to ask a question that is 3 relationships, or hops, away from your starting user. In our example, the question is “what other products have our related users bought, that our starting user has not bought?”.

Step 3: Building Recommendation Engines

With that final result, in 3 simple questions, you just created a list of potential products as shown below, that you might recommend to your starting user, in essence creating a simple recommendation engine. Clearly, there is much more to ensuring that those products will fit the starting user, but this is a way to show the backbone of the process where the relevance is based on the fact that people who buy one thing often buy other similar products.

Graphable Recommendation Engines - Results

This 3 -tep question process can be applied to any interconnected data, driving recommendations and insights from your data in minutes. After going through this kind of process, you can continue to build on it, adding more context through questions and through more data, increasing both your question complexity and the possible insights you can gain.

Barriers in Building Recommendation Engines

Outlined above is a simple example of leveraging customer behavior data in order to understand purchasing patterns for the purpose of recommending products to customers. It is evident though that the most popular products will always be the most recommended, if we use only this collaborative filtering approach. While it can be effective, some of the downsides of this approach are:

  • It is not well suited to account for new product launches (e.g. no one or very few people have purchased them),
  • If you are trying to give your customers a wider variety of recommended products that may be a fit, but may not connected through similar customers, but instead perhaps through other means.

Graphs can also solve this problem by matching products to customers along any other dimension (e.g. weather data for a customer’s particular location could drive very tailored product recommendations), even using unstructured data (with Natural Language Processing or NLP) such as reviews, user guides, descriptions and more to find that relevance. Using movies as another example, recommendations could be made based on the cast, genre, production company or even filming location and a whole host of other dimensions, leveraging a user’s rating data to find the connectedness across those dimensions.

As a foundation for your recommendation engine, graph databases have the advantage of being a uniquely efficient data store, in that the model itself is optimized for sub-second traversal across any number of relationships and dimensions, whereas traditional relational databases often fail to scale with any amount of complexity in these kinds of use cases. Graph even has the capability to weight relationships and entities (nodes) enabling much more nuanced querying based on numeric levels of certain characteristics (attributes). For example, a particular movie (represented as a node in the database) might be simultaneously 50% horror, 20% drama, and 10% comedy, and can carry those attributes as part of the node.

Calculating a user’s preferences based on their past viewing history, its then possible to create a graph query to find movies that combine all of the many possible attribute scores, based on a combination of dimensions that matter to the user, enabling much more nuanced and even unexpected but uniquely helpful movie recommendations. By leveraging the depth of dimensions available in the data, more and more possibilities emerge, that are increasingly more relevant to the user. This is the kind of precision in finding relevance that matters in today’s context in order to thrive in an increasingly competitive environment.

For an even deeper dive on recommendation engines, you can read the post “ What is a recommendation engine ?” .

Still learning? Check out a few of our introductory articles to learn more:

  • What is a Graph Database?
  • What is Neo4j (Graph Database)?
  • What Is Domo (Analytics)?
  • What is Hume (GraphAware)?

Additional discovery:

  • Hume consulting /  Hume (GraphAware) Platform
  • Neo4j consulting   /  Graph database
  • Domo consulting   / Analytics - BI

We would also be happy to learn more about your current project and share how we might be able to help. Schedule a consultation with us today. We can also discuss pricing on these initial calls, including Neo4j pricing and Domo pricing . We look forward to speaking with you!

  • Press Releases
  • Uncategorized
  • Upcoming Event

[email protected]

Tel: +1 844-472-7471

Share article with friends:

View latest insights

  • Financial Analytics: 5 Targeted Advantages for CFOs & Finance Teams
  • Databricks SQL Warehouse: Unlock the Power of 4 Proven Strategies
  • Databricks Unity Catalog: Unlock Centralized and Effective Enterprise Data Governance and Discovery
  • Power BI vs Tableau: 8 Key Differences for Empowered Business Intelligence Decisions
  • Material Hauling Software Company MatHaul Partners with Graphable to Deliver Revolutionary, Industry-specific ERP Platform

Operating ethically, communicating well, & delivering on-time.


Want to discuss your Neo4j project idea with an expert?


  • Generative AI / LLMs
  • Security / Intelligence
  • Databricks Service
  • Schedule a Demo

Fill out the form below to access the video.

  • Perspectives
  • Best Practices
  • Inside Amplitude
  • Customer Stories
  • Contributors

What Is a Recommendation Engine? How Recommenders Work

Anticipate your customers’ wants and needs by building recommendations based on their own data.

Image of Nate Franklin

Recommendation engines are advanced data filtering systems that predict which content, products, or services a customer is likely to consume or engage with. One doesn’t need to look far to see one in action. Every time someone chooses a TV show using Netflix’s “You May Also Like…” feature or buys a product Amazon recommends, they’re using powerful recommendation engines.

Recommendation engines (sometimes called recommenders ) are win-win features for both customers and the businesses that deploy them. Customers enjoy the level of personalization and assistance a well-tuned recommendation engine provides. Businesses build them because they fuel engagement and encourage sales.

Accurate recommendations don’t appear out of thin air. Businesses must invest in data solutions capable of analyzing a high volume of products and identifying patterns in customer behavior. Only then can they unlock the true value of their customer data and make recommendations that positively impact revenue.

Key takeaways

  • Recommendation engines are advanced data filtering systems that use behavioral data, computer learning, and statistical modeling to predict the content, product, or services customers will like.
  • Customers are drawn to businesses that offer personalized experiences.
  • The three main types of recommendation engines include collaborative filtering , content-based filtering , and hybrid filtering .
  • Recommenders improve revenue by encouraging cross-selling, suggesting product alternatives, and drawing attention to items abandoned in a digital shopping cart.

What is a recommendation engine?

Recommendation engines are tools that leverage predictive analytics to help companies anticipate their customers’ wants and needs. The engines use machine learning and statistical modeling to create advanced algorithms based on a business’s unique historical and behavioral data. The resulting recommendations are based on some combination of:

  • A customer’s past behaviors and history
  • A product’s ranking by consumers
  • The behaviors and history of a similar cohort

Recommendations are most accurate when there’s a great volume of data at a company’s disposal. The more active users a product has, the more data there is to compare behaviors and preferences across demographics.

However, not every bit of data collected will be relevant or even reliable. Building recommendations on bad data results in recommendations that are inaccurate and unhelpful. The first step in creating a workable recommendation engine is adopting a proper data management strategy and analytics stack that collects and verifies data before it is put to use.

Types of recommendation engines and how they work

Not every recommendation engine uses the same methodology to form predictions. Recommenders typically achieve results using one of three types of data filtering: content-based, collaborative filtering, or a combination of the two.

Content-based filtering

This type of filtering is used in “Similar items include…” recommenders. Content-based filtering creates predictions on the actual qualities of the products and services being offered. Products in this system are assigned attributes that can be compared to other products directly. Companies choose the types of attributes used by the engine based on the type of products being consumed.

For instance, an ecommerce website that specializes in selling groceries might tag their products with the following attributes:

  • Type of food (e.g., “fruit” or “cereal”)
  • Established taste (e.g., “bitter” or “sweet”)
  • Container (e.g., “box” or “can”)

The recommender would then compare items historically purchased by the user or those currently in their shopping cart to other similar or linked items. Attributes are weighted by the number of items in the database that share the tag with more common tags receiving higher rankings than uncommon ones. This weighting determines which items appear first in a list of recommendations.

Content-based filtering doesn’t require the input of other customers to make predictions. It bases its predictions on similarities within a customer’s own behavioral and historical profile. A well-designed content-based filtering engine will identify specific quirks and interests that may not have broad appeal to other customers.

A major drawback with this type of recommendation engine is it requires a great deal of maintenance. Attributes must be added and updated constantly to keep recommendations accurate—a daunting task for businesses with a high volume of product. Additionally, the attributes themselves must be accurate. Labeling a Honeycrisp apple “red” is easy, but more complex content may require a dedicated team of subject matter experts to correctly label each individual product.

Collaborative filtering

This method of filtering is what’s used in “People who watched this show also watched…” types of recommenders. Collaborative filtering uses behavioral data to determine what a person will like based on how their preferences compare to other users. Whereas content-based filtering focuses on linking products to other products, collaborative filtering builds predictions by linking similar customer profiles.

For example, imagine using a video streaming platform that uses collaborative filtering. When you go to find a movie, you create data based on a number of behaviors, including:

  • Movies you watch
  • Titles you select but ultimately do not watch
  • Selections you hover over
  • Searches you make
  • Rankings you give films

The recommender then effectively builds a user profile for you based on this data set. It then compares your profile against a cohort of users who behave similarly. The resulting predictions are based on the movies this cohort has consumed and enjoyed versus the actual content of each film.

Collaborative filtering doesn’t require product feature information. This makes maintenance less time-consuming than that of a content-based engine. However, a reliance on other customers’ behaviors can create data gaps. Say no one interacts with your favorite movie on a streaming service. A movie that’s perfectly suited to your interests won’t be recommended because the recommendation engine won’t have any behavioral data with which to form a prediction.

Hybrid filtering

Hybrid filtering attempts to address the shortcomings of both content-based filtering and collaborative filtering by combining the two methods. As such, it’s the most effective of the three types of recommendation systems.

Content-based filtering works well for suggestions that appeal to a user’s current interests. However, they can’t accurately predict what users may like outside of their documented preferences. In a hybrid filtering system, this deficit is covered by collaborative filtering. Collaborative filtering can suggest related content that falls outside of a user’s established profile by basing recommendations on the preferences and behaviors of a similar cohort. Alternatively, content-based filtering helps fill in the gaps created by collaborative systems. If no comparative data exist for similar cohorts, the recommender will default to seeking a match based on attribute tags to find a suitable result.

How recommendation engines are used

Recommendation engines do more than improve the product experience for customers. In 2021, an estimated 39% of businesses of all sizes engaged in predictive analytics to enhance operations— an 11% increase over 2018 . More businesses than ever before are embracing recommendations as customers increasingly prefer personalized experiences. A survey by Epsilon determined that 80% of consumers are more willing to buy from businesses that offer personalized experiences .

A properly built recommender also provides an opportunity for companies to target customers with products they’ve either expressed interest in or are highly likely to enjoy. Recommenders help businesses take advantage of predictions through the following methods:

Providing cross-selling opportunities

A recommendation engine can entice customers with products that are complementary but not necessarily similar. A winter hat and gloves are two completely different articles of clothing, and yet someone ordering one could very easily find a use for the other. A recommender identifies these relationships and makes data-based suggestions that help increase the value of individual orders.

Addressing cart abandonment

Items abandoned in digital shopping carts are excellent recommendation opportunities. Customers were interested enough in an item to place it in their cart. Their incomplete sale could be a change of mind or an external disruption of the buying experience.

Suggesting the items again to a customer at a later time can push them across the finish line. A customer may have temporarily talked themselves out of purchasing every Wham! song in the catalog. However, a gentle reminder that “Wake Me Up Before You Go-Go” is gathering cobwebs in their cart might be enough to change their minds. These reminders can be displayed both within the product itself or even as an email message after the initial session.

Offering alternatives

Recommendation engines provide “backup” suggestions for cases where the option determined by the algorithm to be the “most likely” isn’t one the customer wants. Your recommender might be perfect, but it’s always at the whim of the human brain. For instance, a recommendation engine can’t know that a customer had a bad interaction with a specific brand in 1987.

There’s also no way for machines to understand the finer aspects of human intent. A viewer may want to ironically enjoy the infamous 2003 movie “The Room,” but their search may instead return results for the critically acclaimed 2015 Oscar winner “Room.” Recommended alternatives help get the customers where they wanted to go instead of searching for what they actually wanted in frustration.

Examples of recommendation engines in action

Recommendation engines have become especially popular in the ecommerce world for their use in suggesting related products. Many other industries have created digital products that either heavily feature or are built on recommenders. Prominent examples include:

Amazon is the home of one of the most famous recommendation engines on the planet. The ecommerce giant sells tens of millions of unique products, and every one of them is cataloged for use by its recommender. In fact, Amazon was one of the first major ecommerce companies to pioneer content-based filtering and filed a patent for their system as far back as 2001. Two decades later, Amazon’s recommendations account for as much as 35% of their total sales .

Chik-fil-A might be famous for their good ol’-fashioned fried chicken, but their online ordering experience benefits from the application of a modern recommender. Online shoppers may find that the Chik-fil-A menu does not always display the same products at the top of the menu with each visit. Instead, the team built a recommendation engine using Amplitude Recommend that suggests new or popular items based in large part on similar past orders.

Wantable describes itself as a “try-before-you-buy” online retailer. A new customer fills out a personal survey based on their style preferences and measurements. Their recommender uses this information to predict which articles of clothing best fit the customer’s profile. Clothing is then shipped to the customer, where they view, try on, and decide whether they’d like to keep each article or return it. The success of Wantable is entirely dependent on the accuracy of both their recommendations and the attribute tags required to make them.

Bring the power of recommendations to your company

Now that you’ve learned the basics about recommendation engines, it’s time to explore how these tactics can improve conversion and retention metrics at your company. Download the  Mastering Retention playbook today or take a tour of Amplitude to continue your learning about personalized digital experiences.

Dresner Advisory Services, 2021 Data Science and Machine Learning Market Study Report

Epsilon, Power of Me

GlobeNewswire, Dresner Advisory Services Announces 2018 Advanced and Predictive Analytics Market Study

McKinsey, How retailers can keep up with consumers

SiliconANGLE, Amplitude uses personalization to satisfy Chik-fil-A’s appetite for success

Register for AmpliTour

AWS Machine Learning Blog

Creating a recommendation engine using amazon personalize.

This is a guest blog post by Phil Basford, lead AWS solutions architect, Inawisdom.

At re:Invent 2018 , AWS announced Amazon Personalize , which allows you to get your first recommendation engine running quickly, to deliver immediate value to your end user or business. As your understanding increases (or if you are already familiar with data science), you can take advantage of the deep capabilities of Amazon Personalize to improve your recommendations.

Working at Inawisdom , I’ve noticed increasing diversity in the application of machine learning (ML) and deep learning. It seems that nearly every day I work on a new exciting use case, which is great!

The most well-known and successful ML use cases have been retail websites, music streaming apps, and social media platforms. For years, they’ve been embedding ML technologies into the heart of their user experience. They commonly provide each user with an individual personalized recommendation, based on both historic data points and real-time activity (such as click data).

Inawisdom was lucky enough to be given early access to try out Amazon Personalize while it was in preview release. Instead of giving it to data scientists or data engineers, the company gave it to me, an AWS solutions architect. With no prior knowledge, I was able to get a recommendation from Amazon Personalize in just a few hours. This post describes how I did so.

The most daunting aspect of building a recommendation engine is knowing where to start. This is even more difficult when you have limited or little experience with ML. However, you may be lucky enough to know what you don’t know (and what you should figure out), such as:

  • What data to use.
  • How to structure it.
  • What framework/recipe is needed.
  • How to train it with data.
  • How to know if it’s accurate.
  • How to use it within a real-time application.

Basically, Amazon Personalize provides a structure and supports you as it guides you through these topics. Or, if you’re a data scientist, it can act as an accelerator for your own implementation.

Creating an Amazon Personalize recommendation solution

You can create your own custom Amazon Personalize recommendation solution in a few hours. Work through the process in the following diagram.

Creating dataset groups and datasets

When you open Amazon Personalize, the first step is to create a dataset group, which can be created from loading historic data or from data gathered from real-time events. In my evaluation of Amazon Personalize at Inawisdom, I used only historic data.

When using historic data, each dataset is imported data from a .csv file located on Amazon S3, and each dataset group can contain three datasets:

  • Interactions

For the purpose of this quick example, I only prepared the Interactions data file, because it’s required and the most important.

The Interactions dataset contains a many-to-many relationship (in old relational database terms) that maps USER_ID to ITEM_ID . Interactions can be enriched with optional User and Item datasets that contain additional data linked by their IDs. For example, for a film-streaming website, it can be valuable to know the age classification of a film and the age of the viewer and understand which films they watch.

When you have all your data files ready on S3, import them into your data group as datasets. To do this, define a schema for the data in the Apache Avro format for each dataset, which allows Amazon Personalize to understand the format of your data. Here is an example of a schema for Interactions:

In evaluating Amazon Personalize, you may find that you spend more time at this stage than the other stages. This is important and reflects that the quality of your data is the biggest factor in producing a usable and accurate model. This is where Amazon Personalize has an immediate effect—it’s both helping you and accelerating your progress.

Don’t worry about the format of the data, just the key fields being identified.  Don’t get caught up in worrying about what model to use or the data it needs. Your focus is just on making your data accessible. If you’re just starting out in ML, you can get a basic dataset group working quickly with minimal data. If you’re a data scientist, you probably come back to this stage again to improve and add more data points (data features).

Creating a solution

When you have your dataset group with data in it, the next step is to create a solution. A solution covers two areas—selecting the model (recipe) and then using your data to train it. You have recipes and a popularity baseline from which to choose. Some of the recipes on offer include the following:

  • Personalized reranking (search)
  • SIMS—related items
  • HRNN (Coldstart, Popularity-Baseline, and Metadata)—user personalization

If you’re not a data scientist, don’t worry. You can use AutoML, which runs your data against each of the available recipes.  Amazon Personalize then judges the best recipe based on the accuracy results produced. This also covers changing some of the settings to get better results (hyperparameters).  The following image shows a solution with the metric section at the bottom showing accuracy:

Amazon Personalize allows you to get something up and running quickly, even if you’re not a data scientist. This includes not just model selection and training, but restructuring the data into what each recipe requires and hiding the hassle of spinning up servers to run training jobs. If you are a data scientist, this is also good news, because you can take full control of the process.

Creating a campaign

After you have a solution version (a confirmed recipe and trained artifacts), it’s time to put it into action. This isn’t easy, and there is a lot to consider in running ML at scale.

To get you started, Amazon Personalize allows you to deploy a campaign (an inference engine for your recipe and the trained artifacts) as a PaaS. The campaign returns a REST API that you can use to produce recommendations. Here is an example of calling your API from Python:

The results:

Amazon Personalize is a great addition to the AWS set of machine learning services . Its two-track approach allows you to quickly and efficiently get your first recommendation engine running and deliver immediate value to your end user or business. Then you can harness the depth and raw power of Amazon Personalize, which will keep you coming back to improve your recommendations.

Amazon Personalize puts a recommendation engine in the hands of every company and is now available in US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Singapore) and EU (Ireland). Well done, AWS!​

  • Getting Started

Blog Topics

  • Amazon Comprehend
  • Amazon Kendra
  • Amazon Polly
  • Amazon Rekognition
  • Amazon SageMaker
  • Amazon Textract
  •  Twitter
  •  Facebook
  •  LinkedIn
  •  Twitch
  •  Email Updates
  • United States
  • United Kingdom

Machine learning: How to create a recommendation engine

In this excerpt from the book “pragmatic ai,” learn how to code recommendation engines based on machine learning in aws, azure, and google cloud.

By Noah Gift

InfoWorld |

Machine learning tutorial: How to create a recommendation engine

What do Russian trolls, Facebook, and US elections have to do with machine learning? Recommendation engines are at the heart of the central feedback loop of social networks and the user-generated content (UGC) they create. Users join the network and are recommended users and content with which to engage. Recommendation engines can be gamed because they amplify the effects of thought bubbles. The 2016 US presidential election showed how important it is to understand how recommendation engines work and the limitations and strengths they offer.

AI-based systems aren’t a panacea that only creates good things; rather, they offer a set of capabilities. It can be incredibly useful to get an appropriate product recommendation on a shopping site, but it can be equally frustrating to get recommended content that later turns out to be fake (perhaps generated by a foreign power motivated to sow discord in your country).

This chapter covers recommendation engines and natural language processing (NLP), both from a high level and a coding level. It also gives examples of how to use frameworks, such as the Python-based recommendation engine Surprise, as well as instructions how to build your own. Some of the topics covered including the Netflix prize, singular-value decomposition (SVD), collaborative filtering, real-world problems with recommendation engines, NLP, and production sentiment analysis using cloud APIs.

The Netflix prize wasn’t implemented in production

Before “data science” was a common term and Kaggle was around, the Netflix prize caught the world by storm. The Netflix prize was a contest created to improve the recommendation of new movies. Many of the original ideas from the contest later turned into inspiration for other companies and products. Creating a $1 million data science contest back in 2006 sparked excitement that would foreshadow the current age of AI. In 2006, ironically, the age of cloud computing also began, with the launch of Amazon EC2.

The cloud and the dawn of widespread AI have been intertwined. Netflix also has been one of the biggest users of the public cloud via Amazon Web Services. Despite all these interesting historical footnotes, the Netflix prize-winning algorithm was never implemented into production. The winners in 2009 , the BellKor’s Pragmatic Chaos team, achieved a greater than 10 percent improvement with a Test RMS of 0.867. The team’s paper describes that the solution is a linear blend of more than 100 results. A quote in the paper that is particularly relevant is “A lesson here is that having lots of models is useful for the incremental results needed to win competitions, but practically, excellent systems can be built with just a few well-selected models.”

The winning approach for the Netflix competition was not implemented in production at Netflix because the engineering complexity was deemed too great when compared with the gains produced. A core algorithm used in recommendations, SVD, as noted in “ Fast SVD for Large- Scale Matrices ,” “though feasible for small data sets or offline processing, many modern applications involve real-time learning and/or massive data set dimensionality and size.” In practice, this is one of huge challenges of production machine learning: the time and computational resources necessary to produce results.

I had a similar experience building recommendation engines at companies. When an algorithm is run in a batch manner, and it is simple, it can generate useful recommendations. But if a more complex approach is taken, or if the requirements go from batch to real time, the complexity of putting it into production and/or maintaining it explodes. The lesson here is that simpler is better: choosing to do batch-based machine learning versus real-time. Or choosing a simple model versus an ensemble of multiple techniques. Also, deciding whether it may make sense to call a recommendation engine API versus creating the solution yourself.

Key concepts in recommendation systems

Figure 1 shows a social network recommendation feedback loop. The more users a system has, the more content it creates. The more content that is created, the more recommendations it creates for new content. This feedback loop, in turn, drives more users and more content. As mentioned at the beginning of this chapter, these capabilities can be used for both positive and negative features of a platform.

Figure 1: The social network recommendation feedback loop

Using the Surprise framework in Python

One way to explore the concepts behind recommendation engines is to use the Surprise framework . A few of the handy things about the framework are that it has built-in data sets— MovieLens and Jester—and it includes SVD and other common algorithms including similarity measures. It also includes tools to evaluate the performance of recommendations in the form of root mean squared error (RMSE) and mean absolute error (MAE), as well as the time it took to train the model.

Here is an example of how it can be used in a pseudo production situation by tweaking one of the provided examples.

First are the necessary imports to get the library loaded:

A helper function is created to convert IDs to names:

Similarities are computed between items:

Finally, ten recommendations are provided, which are similar to another example in this chapter:

In exploring this example, consider the real-world issues with implementing this in production. Here is an example of a pseudocode API function that someone in your company may be asked to produce:

Some questions to ask in implementing this are: What trade-offs are you making in picking the top from a group of selections versus just a movie? How well will this algorithm perform on a very large data set? There are no right answers, but these are things you should think about as you deploy recommendation engines into production.

Cloud solutions to recommendation systems

The Google Cloud Platform has an example of using machine learning on Compute Engine to make product recommendations that is worth exploring. In the example, PySpark and the ALS algorithm are used along with proprietary cloud SQL. Amazon also has an example of how to build a recommendation engine using its platform, Spark, and Elastic Map Reduce (EMR).

In both cases, Spark is used to increase the performance of the algorithm by dividing the computation across a cluster of machines. Finally, AWS is heavily pushing SageMaker , which can do distributed Spark jobs natively or talk to an EMR cluster.

Real-world production issues with recommendations

Most books and articles on recommendation focus purely on the technical aspects of recommendation systems. This book is about pragmatism, and so there are some issues to talk about when it comes to recommendation systems. A few of these topics are covered in this section: performance, ETL, user experience (UX), and shills/bots.

One of the most popular algorithms as discussed is O(n_samples^2 * n_features) or quadratic. This means that it is very difficult to train a model in real time and get an optimum solution. Therefore, training a recommendation system will need to occur as a batch job in most cases, without some tricks like using a greedy heuristic and/or only creating a small subset of recommendations for active users, popular products, etc.

When I created a user-follow recommendation system from scratch for a social network, I found many of these issues came front and center. Training the model took hours, so the only realistic solution was to run it nightly. Additionally, I later created an in-memory copy of our training data, so the algorithm was only bound on CPU, not I/O.

Performance is a nontrivial concern in creating a production recommendation system in both the short term and the long term. It is possible that the approach you initially use may not scale as your company grows users and products. Perhaps initially, a Jupyter Notebook, Pandas, and SciKit-Learn were acceptable when you had 10,000 users on your platform, but it may turn out quickly to not be a scalable solution.

Instead, a PySpark-based support vector machine training algorithm may dramatically improve performance and decrease maintenance time. And then later, again, you may need to switch to dedicated machine learning chips like TPU or the Nvidia Volta. Having the ability to plan for this capacity while still making initial working solutions is a critical skill to have to implement pragmatic AI solutions that actually make it to production.

Real-world recommendation problems: Integration with production APIs

I found many real-world problems surface in production in startups that build recommendations. These are problems that are not as heavily discussed in machine learning books. One such problem is the cold-start problem. In the examples using the Surprise framework, there is already a massive database of “correct answers.” In the real world, you have so few users or products it doesn’t make sense to train a model. What can you do?

A decent solution is to make the path of the recommendation engine follow three phases. For phase one, take the most popular users, content, or products and serve those out as a recommendation. As more UGC is created on the platform, for phase two, use similarity scoring (without training a model). Here is some hand-coded code I have used in production a couple of different times that did just that. First, I have a Tanimoto score, or Jaccard distance, by another name.

Next is HBD: Here Be Dragons. Follower relationships are downloaded and converted in a Pandas DataFrame .

To use this API, you would engage with it by following this sequence:

This “phase 2” similarity score-based recommendation with the current implementation would need to be run as a batch API. Additionally, Pandas will eventually run into some performance problems at scale. Ditching it at some point for either PySpark or Pandas on Ray is going to be a good move.

For “phase 3,” it is finally time to pull out the big guns and use something like Surprise and/or PySpark to train an SVD-based model and figure out model accuracy. In the first part of your company’s history, though, why bother when there is little to no value in doing formal machine learning model training?

Another production API issue is how to deal with rejected recommendations. There is nothing more irritating to a user than to keep getting recommendations for things you don’t want or already have. So, yet another sticky production issue needs to be solved. Ideally, the user is given the ability to click, “do not show again” for a list of recommendations, or quickly your recommendation engine becomes garbage. Additionally, the user is telling you something, so why not take that signal and feed it back into your recommendation engine model?

Cloud NLP and sentiment analysis

All three of the dominant cloud providers—Amazon Web Services (AWS), Google Cloud Platform (GCP) , and Microsoft Azure —have solid NLP engines that can be called via an API. In this section, NLP examples on all three clouds will be explored. Additionally, a real-world production AI pipeline for NLP pipeline will be created on AWS using serverless technology.

NLP on Microsoft Azure

Microsoft Cognitive Services has the Text Analytics API that has language detection, key phrase extraction, and sentiment analysis. In Figure 2, the endpoint is created so API calls can be made. This example takes a negative collection of movie reviews from the Cornell Computer Science Data Set on Movie Reviews and uses it to walk through the API.

Figure 2: The Microsoft Azure Cognitive Services API

First, imports are done in this first block in Jupyter Notebook:

Next, an API key is taken from the environment. This API key was fetched from the console shown in Figure 2 under the section keys and was exported as an environmental variable, so it isn’t hard-coded into code. Additionally, the text API URL that will be used later is assigned to a variable.

Next, one of the negative reviews is formatted in the way the API expects:

The data structure with the following shape is created:

Finally, the sentiment anslysis API is used to score the individual documents:

At this point, the return scores can be converted into a Pandas DataFrame to do some EDA. It isn’t a surprise that the median value of the sentiments for a negative review are 0.23 on a scale of 0 to 1, where 1 is extremely positive and 0 is extremely negative:

This is further explained by doing a density plot. Figure 3 shows a majority of highly negative sentiments.

Figure 3: A density plot of sentiment scores

NLP on Google Cloud Platform

There is a lot to like about the Google Cloud Natural Language API . One of the convenient features of the API is that you can use it in two different ways: analyzing sentiment in a string, and also analyzing sentiment from Google Cloud Storage. Google Cloud also has a tremendously powerful command-line tool that makes it easy to explore its API. Finally, it has some fascinating AI APIs, some of which will be explored in this chapter: analyzing sentiment, analyzing entities, analyzing syntax, analyzing entity sentiment, and classifying content.

Exploring the Entity API

Using the command-line gcloud API is a great way to explore what one of the APIs does. In the example, a phrase is sent via the command line about LeBron James and the Cleveland Cavaliers:

A second way to explore the API is to use Python. To get an API key and authenticate, you need to follow Google’s instructions . Then launch the Jupyter Notebook in the same shell as the GOOGLE_APPLICATION_CREDENTIALS variable is exported:

Once this authentication process is complete, the rest is straightforward. First, the Python language API must be imported. (This can be installed via pip if it isn’t already: pip install --upgrade google-cloud-language .)

Next, a phrase is sent to the API and entity metadata is returned with an analysis:

The output has a similar look and feel to the command-line version, but it comes back as a Python list:

A few of the takeaways are that this API could be easily merged with some of the other explor tions done in Chapter 6, “Predicting Social-Media Influence in the NBA.” It wouldn’t be hard to imagine creating an AI application that found extensive information about social influencers by using these NLP APIs as a starting point. Another takeaway is that the command line given to you by the GCP Cognitive APIs is quite powerful.

Production serverless AI pipeline for NLP on AWS

One thing AWS does well, perhaps better than any of the Big Three clouds, is make it easy to create production applications that are easy to write and manage. One of its game-changer innovations is AWS Lambda . It is available to both orchestrate pipelines and serve HTTP endpoints, like in the case of chalice. In Figure 4, a real-world production pipeline is described for creating an NLP pipeline.

Figure 4: A production serverless NLP pipeline on AWS

To get started with AWS sentiment analysis, some libraries need to be imported:

Next, a simple test is created:

The output shows a SentimentScore:

Now, in a more realistic example, I’ll use the previous “negative movie reviews document” from the Azure example. The document is read in:

Next, one of the “documents” (remember that each line is a document according to NLP APIs) is scored:

It’s no surprise that that document had a negative sentiment score since it was previously scored this way. Another interesting thing this API can do is to score all of the documents inside as one giant score. Basically, it gives the median sentiment value. Here is what that looks like:

An interesting takeaway is that the AWS API has some hidden tricks up its sleeve and has a nuisance that is missing from the Azure API. In the previous Azure example, the Seaborn output showed that, indeed, there was a bimodal distribution with a minority of reviews liking the movie and a majority disliking the movie. The way AWS presents the results as “mixed” sums this up quite nicely.

The only things left to do is to create a simple chalice app that will take the scored inputs that are written to Dynamo and serve them out. Here is what that looks like:

If data is the new oil, then UGC is the sand tar pits. Sand tar pits have been historically difficult to turn into production oil pipelines, but rising energy costs and advances in technology have allowed their mining to become feasible. Similarly, the AI APIs coming out of the Big Three cloud providers have created new technological breakthroughs in sifting through the “sandy data.” Also, prices for storage and computation have steadily dropped, making it much more feasible to convert UGC into an asset from which to extract extra value. Another innovation lowering the cost to process UGC is AI accelerators . Massive parallelization improvements by ASIC chips like TPUs, GPUs, and field-programmable graphic arrays (FPGAs) may make some of the scale issues discussed even less of an issue.

This chapter showed many examples of how to extract value from these tar pits, but there are also real trade-offs and dangers, as with the real sand tar pits. UGC to AI feedback loops can be tricked and exploited in ways that create consequences that are global in scale. Also, on a much more practical level, there are trade-offs to consider when systems go live. As easy as the cloud and AI APIs make creating solutions, the real trade-offs cannot be abstracted away, like UX, performance, and the business implications of implemented solutions.

Next read this:

  • Why companies are leaving the cloud
  • 5 easy ways to run an LLM locally
  • Coding with AI: Tips and best practices from developers
  • Meet Zig: The modern alternative to C
  • What is generative AI? Artificial intelligence that creates
  • The best open source software of 2023
  • Machine Learning
  • Software Development

Copyright © 2018 IDG Communications, Inc.

how to create recommendation engine

  • 90% Refund @Courses
  • Free Python 3 Tutorial
  • Control Flow
  • Exception Handling
  • Python Programs
  • Python Projects
  • Python Interview Questions
  • Python Database
  • Data Science With Python
  • Machine Learning with Python

Related Articles

  • Solve Coding Problems
  • How to Calculate Autocorrelation in Python?
  • Collecting data with Google forms and Pandas
  • Python Pandas - Flatten nested JSON
  • Bypassing Pandas Memory Limitations
  • How to Merge multiple CSV Files into a single Pandas dataframe ?
  • How to group data by time intervals in Python Pandas?
  • Pandas – Filling NaN in Categorical data
  • How to use Pandas filter with IQR?
  • Cluster Sampling in Pandas
  • Map True/False to 1/0 in a Pandas DataFrame
  • How to Merge “Not Matching” Time Series with Pandas ?
  • How to deal with missing values in a Timeseries in Python?
  • How to Resample Time Series Data in Python?
  • Pandas Memory Management
  • How to Check if Time Series Data is Stationary with Python?
  • How to Calculate Rolling Correlation in Python?
  • How to convert categorical string data into numeric in Python?
  • Highlight the minimum value in each column In Pandas
  • Create a Pipeline in Pandas

Building Recommendation Engines using Pandas

In this article, we learn how to build a basic recommendation engine from scratch using Pandas.

Building Movie Recommendation Engines using Pandas

A Recommendation Engine or Recommender Systems or Recommender Systems is a system that predicts or filters preferences according to each user’s likings. Recommender systems supervise delivering an index of suggestions via collaborative filtering or content-based filtering.

A Recommendation Engine is one of the most popular and widely used applications of machine learning. Almost all the big tech companies such as E-Commerce websites, Netflix, Amazon Prime and more uses Recommendation Engines to recommend suitable items or movies to the users. It is based on the instinct that similar types of users are more likely to have similar ratings on similar search items or entities.

Now let’s start creating our very basic and simple Recommender Engine using pandas. Let’s concentrate on delivering a simple recommendation engine by presenting things that are most comparable to a certain object based on correlation and number of ratings, in this case, movies. It just tells what movies are considered equivalent to the user’s film choice.

To download the files: .tsv file , Movie_Id_Titles.csv .

Popularity Based Filtering

Popularity-based filtering is one of the most basic and not so useful filtering techniques to build a recommender system. It basically filters out the item which is mostly in trend and hides the rest. For example, in our movies dataset if a movie is rated by most of the users that mean it is watched by so many users and is in trend now. So only those movies with a maximum number of ratings will be suggested to the users by the recommender system. There is a lack of personalization as it is not sensitive to some particular taste of a user.

At first, we will import the pandas library of python with the help of which we will create the Recommendation Engine. Then we loaded the datasets from the given path in the code below and added the column names to it.

how to create recommendation engine

Now we will rank the movies based on the numbers of ratings on the movies. As we are doing popularity-based filtering,  the movies that are watched by more users will have more ratings.

how to create recommendation engine

Then we visualize the top 10 movies with the most rating count:

how to create recommendation engine

Collaborative Filtering

User-based filtering:  .

These techniques suggest outcomes to a user that matching users have picked. We can either apply Pearson correlation or cosine similarity for estimating the resemblance between two users. In user-based collaborative filtering, we locate the likeness or similarity score among users. Collaborative filtering takes into count the strength of the mass. For example, if many people watch e-book A and B both and a new user reads only book B, then the recommendation engine will also suggest the user read book A.

Item-Based Collaborative Filtering: 

Instead of calculating the resemblance among various users, item-based collaborative filtering suggests items based on their likeness with the items that the target user ranked. Likewise, the resemblance can be calculated with Pearson Correlation or Cosine Similarity. For example, if user A likes movie P and a new user B is similar to A then the recommender will suggest movie P to user B.

The below code demonstrates the user-item-based collaborative filtering.

Now we merge the two datasets on the basis of the item_id which is the common primary key for both.

Here we calculate the mean of the number of ratings given to each of the movies. Then we calculate the count of the number of ratings given to each of the movies. We sort them in ascending order as we can see in the output.

how to create recommendation engine

Now we create a new dataframe named ratings_mean_count_data and added the new columns of rating mean and rating count beside each movie title since these two parameters are required for filtering out the best suggestions to the user.

how to create recommendation engine

In the newly created dataframe, we can see the movies along with the mean value of ratings and the number of ratings. Now we want to create a matrix to see each user’s rating on each movie. To do so we will do the following code.

Here each column contains all the ratings of all users of a particular movie making it easy for us to find ratings of our movie of choice.

how to create recommendation engine

So we will see the ratings of Star Wars(1977) as it has got the highest count of ratings. Since we want to find the correlation between movies with the most ratings this will be a good approach. We will see the first 25 ratings.

how to create recommendation engine

Now we will find the movies which correlate with Star Wars(1977) using the corrwith() function. Next, we store the correlation values under column Correlation in a new dataframe called corr_Star_Wars. We removed the NaN values from the new dataset.

We displayed the first 10 movies which are highly correlated with Star Wars(1977) in ascending order using the parameter ‘ascending=False’.

how to create recommendation engine

From the above output, we can see that the movies which are highly correlated with Star Wars(1977) are not all famous and well known.

There can be cases where only one user watches a particular movie and give it a 5-star rating. In that case, it will not be a valid rating as no other user has watched it.

So correlation only might not be a good metric for filtering out the best suggestion. So we added the column of rating_counts to the data frame to account for the number of ratings along with correlation.

We assumed that the movies which are worth watching will at least have some ratings greater than 100. So the below code filters out the most correlated movies with ratings from more than 100 users.

how to create recommendation engine

We can better visualize see the final set of recommended movies

how to create recommendation engine

Therefore the above movies will be recommended to users who have just finished watching or had watched Star Wars(1977). In this way, we can build a very basic recommender system with pandas. For real-time recommender engines definitely, pandas will not fulfill the needs. for that, we will have to implement complex machine learning algorithms and frameworks.

Don't miss your chance to ride the wave of the data revolution! Every industry is scaling new heights by tapping into the power of data. Sharpen your skills and become a part of the hottest trend in the 21st century.

Dive into the future of technology - explore the Complete Machine Learning and Data Science Program by GeeksforGeeks and stay ahead of the curve.

Please Login to comment...


  • Python-pandas
  • Python-projects

Improve your Coding Skills with Practice


What kind of Experience do you want to share?

how to create recommendation engine

Guide to Recommendation System: Types, Selection Criteria, How to Build One

Ever wonder how websites and apps seem to know exactly what you’d like to watch, read, or buy next? It’s all thanks to recommendation engines, also known as “recs,” “recommender systems,” or “suggestive algorithms.” These systems help you discover new movies on streaming platforms, exciting games, your next YouTube binge, or even that perfect pair of sneakers on your favorite online store. They’re the reason why, after watching one cute kitten video, you find yourself in a never-ending loop of adorable animal clips.

So if you want to find out how these algorithms actually work, you’ve come to the right place to learn. We’ve built powerful recommendation systems, and now our mission is to share our expertise in a way that’s easy to understand to anyone.

What is a recommendation system?

How does a recommendation system work, machine learning techniques in recommendation systems.

  • Types of recommendation systems

Benefits of recommendation systems

  • How to choose the right recommendation engine

How to build a recommendation engine?

What industries use recommendation systems, real-word use cases and examples of recommendation systems, building an ai/ml application or extending your development team.

🚀 We're here to assist you in accelerating and scaling your business. Send us your inquiry, and we'll schedule a free estimation call .

Key takeaways

  • A recommendation engine is an AI-driven system that generates personalized suggestions to users based on collected data.
  • The recommendation process consists of 4 main steps: collecting, analyzing, and filtering data, and then generating recommendations using machine learning techniques.
  • There are 4 main types of recommender systems that use different filtering methods: content-based filtering, collaborative filtering, hybrid method, and deep learning-based.
  • There are many business benefits of using recommender systems: personalized UX, increased revenue, enhanced user engagement, among others.
  • Various industries utilize recommendation systems: ecommerce, media, entertainment, travel , gaming, and more.
  • To choose the right recommender system, consider either out-of-the-box tools or scalable custom solutions.

A recommendation system is an advanced technology that utilizes machine learning and data analysis to provide personalized suggestions to users . It operates by collecting and analyzing user behavior, user preferences, and historical user item interactions.

By applying complex algorithms and statistical models, recommendation engines are capable of predicting and presenting users with items, services, or content that align with their interests and preferences.

There are different types of recommendation systems, including collaborative filtering methods , content-based filtering , and user-based collaborative filtering , and we’ll look at each of them in this article.

Recommender system technology is widely used across various industries, particularly in e-commerce, streaming platforms, news and media, and digital marketing, to enhance user engagement, boost user trust, increase sales, and improve overall customer satisfaction.

user similarity

We are often asked by our clients how exactly the recommendation system works.

To explain in simple, non-technical terms, we always describe the process in the following four main steps:

Step 1: Collecting user data

Recommendation systems primarily collect data through user interactions , including clicks, views, and purchase history of different users.

Additionally, user feedback such as ratings and reviews are also collected, as well as user profiles which incorporate demographics and browsing patterns, and item attributes, including product descriptions and tags.

Insights are also gained through the user’s past behavior . External sources like social media, third-party reviews, and real-time session data are integral to expanding the system’s knowledge base. They provide crucial insights that allow us to better understand our customers and improve our business practices.

Gathering data is the most critical phase of the recommendation process.

In case of insufficient data, especially at the very beginning, the cold start problem can occur. The cold start issue in recommendation engines happens when there isn’t enough data to make predictions for a new user. Absent previous user interaction, the system has difficulty in proposing customized recommendations for new users and items. This obstacle might affect user engagement and the effectiveness of the recommendation system.

Step 2: Analyzing data

By analyzing mentioned data, recommendation systems can effectively predict and meet the preferences of users.

Most recommender systems analyze a combination of factors to predict a user’s preference and display the most accurate recommendation.

Recommendation system monitors website traffic to identify popular content and examine the content’s elements to match them with user interactions. User feedback shapes the recommendations, and the systems take into account patterns from user sessions and the preferences of similar users . Attributes like the brand or color of the item can also help to further refine the recommendations.

Depending on the recommendation engine, there can be additional factors that affect the suggestions. These additional factors make sure that every user receives a personalized experience.

Step 3: Filtering

In this phase, data is subjected to an advanced process utilizing a matrix factorization method.

Depending on whether it’s a collaborative, content-based, or hybrid recommendation model (you will learn more about these methods in the following part of this article), s pecific matrices and mathematical algorithms are applied .

The end product of this advanced matrix computation is the final set of recommendations.

Step 4: Generating recommendations

In this stage, known as ‘ candidate generation ‘, the recommendation system creates a selection of potential options based on the user’s input.

A recommender system then improve this selection by prioritizing the most suitable options. The efficiency of this ranking is key to delivering successful recommendations.

The use of artificial intelligence (AI) is essential to this process. The best recommendation platforms use adaptive AI models that continuously capture and process user preferences, similar to a market analyst understanding customer trends.

Thanks to advanced machine learning techniques, recommendations can be tailored to each user’s individual needs.

So, what mechanism drives the data analysis that generates the final recommendation? Surprise, surprise: it’s AI .

Machine learning, as a subcategory of artificial intelligence , enables recommender systems to recognize patterns and relationships in large historical datasets , such as understanding complex aspects of user’s behavior.

For generating tailored content, recommendation systems rely on specific training data and algorithms.

While deep learning models with neural networks are complex, traditional machine learning models enable systems to adapt and learn without direct programming.

It’s worth noting that recommendation system learning doesn’t necessarily depend on deep neural networks or advanced deep learning techniques, such as natural language processing.

They can still deliver precise product recommendations for users without these advanced methods.

Data security

Machine learning begins by collecting data in a database. Next, the system analyzes the data, whether it is focused on content or user behavior. After that, the data is categorized, learned from, and used to derive precise insights and forecasts.

So, how do we identify visitors?

The simple answer is cookies . These small text files contain a unique string of characters that are essential for identifying users. Often, product recommendations depend on the system’s ability to track user behavior. However, attributing this data to a specific individual requires the use of cookies, which are essential.

Although there is a general concern for personal data protection, cookies do not store details such as names, credit card information, or other personal identifiers. They only contain a code that identifies a particular visitor.

Using a cookie-driven recommendation engine, systems can link users to their respective sessions, allowing for the reconstruction of user pathways. It should be noted that not all visitors may agree to the cookie policy.

4 types of recommendation systems

As mentioned earlier, there are four main recommendation filtering methods that are most commonly used.

Thanks to the application of different methods, the suggestions are varied and as accurate as possible. It is worth blending them to achieve optimal results.

1. Collaborative filtering recommender systems

Collaborative filtering method is one of the most common techniques used in recommender systems. It operates on the principle of user-item interactions.

The main idea is that if two users agree on the evaluation of certain items, they will likely agree on the evaluation of other items as well.

Collaborative filtering can be further divided into two types:

User-based Collaborative Filtering

The user-based collaborative filtering method finds similar users to the target user and recommends items that those similar users have liked. For instance, if Alice and Bob both liked movies X and Y, and Bob also liked movie Z, then the system might recommend movie Z to Alice.

Item-based Collaborative Filtering

Instead of finding user similarities, this method focuses on item similarities. If users A and B both liked item 1 and item 2, then the items are considered similar. Hence, if user A likes item 3, it might be recommended to user B.

2. Content based recommender systems

Content-based filtering focus on the attributes of items and give you recommendations based on the similarity between them.

For instance, if a user has shown interest in a particular type of movie, the system will recommend movies that fall into that category.

The content of each item is represented as a set of descriptors or terms that are inherent to the item.

For example, in a content-based movie recommendation system, the features of the movie, such as genre, director, actor, etc., can be used to describe the movie and recommend similar items.

machine learning

3. Hybrid recommender systems

Hybrid recommendation systems combine both collaborative filtering approach and content-based filtering to provide recommendations.

Hybrid systems can be implemented in several ways:

  • By making predictions separately with each approach and combining them.
  • By adding collaborative filtering and content-based filtering capabilities together.
  • By unifying the approaches into a single model.

The main advantage of hybrid systems is that they can address the limitations of both collaborative filtering and content-based filtering systems.

For example, they can provide personalized recommendations to users with unique tastes and can handle situations where there’s limited user-item interaction data. This avoids the previously mentioned cold start problem.

4. Deep learning-based recommendations

Deep learning-based recommendation systems utilize deep neural networks to make predictions or recommendations.

These systems can automatically learn and extract features from raw data, making them highly effective, especially with large datasets.

Deep learning models, such as Convolutional Neural Networks (CNNs) for image data or Recurrent Neural Networks (RNNs) for sequential data, can be used depending on the type of data at hand.

For instance, platforms like YouTube use deep learning recommenders to suggest videos to users based on their viewing history.

Recommendation systems are now an essential component of various digital platforms, ranging from e-commerce websites to streaming services.

Each platform benefits differently from the recommender system. We all can observe the effects of the recommendation system on a daily basis. Recommender systems lead to a better user experience and significantly reduce the time spent searching for the best items.

So now, let’s see what are the benefits of recommender systems from a business perspective.

Personalized user experience

One of the primary advantages of a recommendation system is the ability to provide a personalized experience for each user. By analyzing user behavior and preferences, these systems can suggest products, content, or services that align with individual tastes.

Increased sales and revenue

By suggesting relevant and personalized products or content, recommender systems can drive more sales. This not only boosts the conversion rate but also increases the overall revenue for businesses.

Enhanced user engagement

When users are presented with content or products that resonate with their preferences, they are more likely to engage. This can lead to longer session times and more frequent visits.

Efficient discovery of new content

With the vast amount of content available online, discovering something new can be a challenge. Recommender systems help in this by introducing users to new products or content that they might not have stumbled upon otherwise.

Improved decision making for businesses

Understanding what users are looking for can inform decisions related to inventory management, marketing strategies, and more.

Enhanced customer satisfaction

Meeting and exceeding user expectations can lead to higher satisfaction levels. When users consistently find what they’re looking for, their overall satisfaction with the platform increases.

Targeted advertising

For platforms that rely on advertising revenue, recommendation systems can be invaluable. By understanding user preferences, businesses can serve more relevant ads, leading to higher click-through rates.

How to choose the right recommendation engine: most important criteria

1. solution type: out-of-the-box vs. custom.

Decide between an “out-of-the-box” recommendation system and a custom-built solution . Out-of-the-box solutions are ready-made and can be quickly integrated into your system. On the other hand, custom recommender systems are tailored to your specific business needs, offering more flexibility and customization.

2. Understand the pricing model

If you’re leaning towards an out-of-the-box recommender system, it’s crucial to understand its pricing structure. Some providers charge based on the number of users, while others base their fees on the number of recommendations made. It’s essential to choose a model that aligns with your business’s financial strategy.

3. Choosing the right tech partner

For those considering a custom recommender system, choosing the right technology partner is paramount. You should:

  • Evaluate their expertise to ensure they have the technical know-how.
  • Consider their experience and how long they’ve been in the industry.
  • Review their portfolio to see if they’ve built systems similar to what you’re envisioning.
  • Check their performance metrics to gauge how successful, powerful, and scalable their systems have been in the past.

4. Prioritize scalability

Your chosen recommendation system should be scalable . As your website traffic grows, the system should adapt without compromising its performance . This ensures that as your business expands, your recommendation engine remains efficient and effective.

5. Addressing the cold start problem

Especially for smaller stores, the cold start problem can be a challenge. This issue arises when there’s insufficient data to make accurate recommendations. It’s vital to select a system capable of providing precise suggestions even if you’re operating a smaller shop.

6. Collaboration and alignment

Ensure that you partner with a provider whose values and vision align with yours. A shared approach and working style can lead to smoother collaboration and more successful integration of the recommender system into your business operations.

7. Integration and compatibility

The recommendation engine should easily integrate with your existing systems, platforms, and software. Check for compatibility with your current tech stack to avoid potential integration challenges and additional costs.

8. Real-time recommendations

Opt for a solution that offers real-time recommendations. As user behavior and preferences change, the system should instantly update its suggestions, ensuring that users always receive the most relevant and timely product or content recommendations.

9. Flexibility and customizability

The business landscape is ever-evolving. Choose a recommendation engine that offers flexibility in terms of features and customizability. This ensures that as your business needs change, your recommendation system can adapt accordingly.

10. Cost of maintenance and support

Beyond the initial investment, consider the ongoing costs of maintaining the system. Additionally, ensure that the provider offers reliable customer support to address any issues or challenges that may arise.

So, we know how these systems work and how to choose one.

Now, let’s see how we can build one.

We’re a team that builds custom software on daily basis, and we’ve built recommendation engines from scratch for ecommerce and marketing sector.

[To learn more about our past project, check our case study: Recostream – AI/ML Personalized Recommendations Engine for eCommerce and Content Providers .]

We’re happy to pass on our knowledge. We’ve made sure to explain in the simplest way possible how our Java build recommendation systems step by step.

  • Gather Data : Begin by analyzing the data that an application collects, possesses, or has the potential to obtain.
  • Identify Useful Data : Determine which pieces of data will be beneficial for generating recommendations.
  • Design the Database : Plan and design a database model tailored to collect and store the identified data.
  • Integrate Systems : Establish a connection between the recommendation engine and the application to enable smooth data transfer.
  • Develop the Mechanism : Work on creating a mechanism that will generate recommendations. This will be based on the algorithm configuration you choose, which is a crucial component of the engine.
  • Algorithm Creation : Dive into the development and fine-tuning of the algorithms that will power the recommendation engine.
  • Finalize Integration : Ensure a seamless integration between the application and the recommendation engine, allowing for easy retrieval of recommendations.

If you want to have more thorough insights into our process of building custom software like that take a look at the complete guide on how to build a recommendation engine explained by our best experts!

In the e-commerce industry, recommender systems play a crucial role in enhancing user experience and driving sales. They analyze a user’s browsing and purchase history to suggest products that are likely to interest them. These recommendations often appear in the “You Might Also Like” or “Customers Who Bought This Also Bought” sections. By personalizing the shopping journey, e-commerce platforms increase customer engagement and boost revenue.

Entertainment and streaming services

Streaming platforms like Netflix, Spotify, and YouTube leverage recommender systems to keep users engaged and entertained. These systems use algorithms to analyze a user’s viewing or listening history and recommend movies, music, or videos that align with their preferences. This keeps users on the platform longer and increases content consumption.

user preference, user based

Social media platforms

Social media platforms such as Facebook, Instagram, and Twitter employ recommendation engines to suggest friends to connect with, posts to interact with, and groups to join. These systems analyze user interactions and interests to create personalized feeds, increasing user engagement and time spent on the platform.

Gaming industry

In the gaming industry, recommendation systems help players discover new games, in-game items, and opponents. They analyze player behavior and preferences to suggest games that align with a player’s interests or to match players with similar skill levels. This enhances the gaming experience and encourages players to explore more titles.

News and media

News websites and media platforms use recommender systems to deliver personalized news articles, videos, and advertisements. By analyzing user reading habits and content preferences, these systems ensure that users are presented with relevant and engaging content, ultimately increasing user retention and ad revenue.

Travel and hospitality

In the travel industry , recommendation engines assist users in finding suitable hotels, flights, and travel itineraries (e.i. travel meta search engines , booking engines , hotel booking engines ). These systems consider a user’s past travel history, preferences, and budget to suggest tailored options, making travel planning more convenient and enjoyable.

Healthcare providers use recommendation systems to suggest treatment plans and medical services based on a patient’s medical history, symptoms, and demographics. This personalized approach improves patient care and outcomes.

In the financial sector , recommender systems suggest investment opportunities, financial products, and personalized banking services based on a user’s financial history and goals. The use of recommender systems in banking technology helps users make informed financial decisions and enhances the customer experience.

In the field of education, recommender systems suggest courses, learning materials, and study paths based on a student’s educational history and learning preferences. These systems promote personalized learning experiences, improving educational outcomes.

Advertising and marketing

In the advertising and marketing industry, recommendation engines deliver targeted advertisements and marketing campaigns to users based on their behavior and interests. This increases the effectiveness of marketing efforts and boosts conversion rates.

In the automotive sector, recommendation systems suggest vehicles, maintenance schedules, and related services based on a user’s vehicle history and preferences. This enhances the customer experience and encourages vehicle maintenance.

These are just a few examples of how recommender systems are utilized across various industries to improve user engagement, drive revenue, and provide personalized experiences.

Recommendation systems in ecommerce

Amazon’s recommendation system.

Amazon is renowned for its highly accurate product recommendations in its online store. The company leverages advanced technologies, including:

  • Artificial intelligence algorithms : These algorithms analyze vast amounts of data to predict what products a user might be interested in.
  • Machine learning : Amazon uses machine learning to continuously improve and refine its recommendation system based on user behavior and feedback.
  • Collaborative filtering : Generates recommendations to other users based on the behavior of similar users.
  • Personalized recommendations : Amazon’s system tailors product suggestions based on individual user behavior, browsing history, and purchase history.
  • Item-to-Item collaborative filtering : Instead of segmenting users into groups, Amazon uses collaborative filtering method to match each product to a set of similar products. This way, when a particular user views a product, they get recommendations of similar items.

active users, user's preference, new user, user features

Shopify’s recommendation system:

Shopify, a leading e-commerce platform, also employs recommendation systems to enhance the shopping experience for its merchants and their customers.

  • Collaborative filtering : Similar to Amazon, Shopify uses collaborative filtering techniques to generate user recommendations based on the behavior of similar users.
  • Data-driven recommendations : Shopify’s recommendation system is heavily data-driven. It analyzes user behavior, user’s preferences, user satisfaction, purchase history, user interest, and other relevant data to provide accurate product suggestions.
  • Personalized product recommendations : Shopify’s system offers personalized product suggestions tailored to individual user behavior, preferences, and user’s existing interests.
  • Complementary Recommendations : Some themes on Shopify have sections that display complementary products to customers on product pages. These are products that go well with the main product being viewed.
  • Related Recommendations : These are auto-generated by Shopify and show products that are related or similar to the main product being viewed.

machine learning

Recommendation systems in streaming

Recommendation systems play a pivotal role in the success of streaming platforms like Netflix and Spotify . Here’s a brief overview of how these systems work for each platform:

Netflix recommender system

  • User behavior tracking : Netflix tracks every action a user takes on their platform, from the movies and series they watch to the ratings they give and even the time they spend on particular scenes.
  • Content tagging : Every piece of content on Netflix is tagged with metadata by both algorithms and human experts. This metadata can include information about the genre, actors, director, and even nuanced details like the mood of the content.
  • Matrix factorization : Netflix employs matrix factorization techniques to decompose the user-item interaction matrix. This helps in predicting the rating a user would give to an unseen movie based on their past behavior.
  • Deep learning : Netflix also uses deep learning models to predict user preferences. These models can capture complex non-linear patterns in the data which traditional algorithms might miss.
  • Personalized thumbnails : Even the thumbnails users see for movies and series are personalized based on their past behavior. For instance, a user who frequently watches romantic movies might see a thumbnail highlighting the romantic aspect of a movie, even if it’s an action film.
  • A/B testing : Netflix constantly tests its recommendation algorithms using A/B testing to ensure that users are getting the most relevant content suggestions.

Spotify recommender system

  • User listening habits : Spotify tracks the songs, albums, and playlists users listen to. It also considers the frequency, recency, and duration of listening sessions.
  • Collaborative filtering : This method predicts a user’s interests by collecting preferences from many users. If a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue.
  • Content-based filtering : Spotify analyzes the raw audio tracks of songs to identify features like tempo, instrumentation, and genre. This allows the platform to recommend songs that are sonically similar to what a user has been listening to.
  • Matrix factorization : Like Netflix, Spotify also uses matrix factorization techniques to predict songs a user might like based on their past listening history.
  • Natural Language Processing (NLP) : Spotify scans the web for articles, blogs, and other textual content about songs and artists to understand public opinion and sentiment. This helps in making more informed recommendations.
  • Playlist personalization : Playlists like “Discover Weekly” and “Release Radar” are personalized for each user based on their listening habits and the algorithms’ predictions.

Recommendation systems in marketing


  • AI/ML-driven models : GetResponse has several AI/ML-driven recommendation models. These models are designed to provide highly accurate and personalized content suggestions to users.
  • AI-Driven product recommendations : On the GetResponse platform, AI product recommendations use artificial intelligence to match product offerings to the preferences, needs, and habits of each store visitor. This ensures that users are presented with products and services that align with their interests and behaviors.

GetResponse uses a recommendation engine built by our team. The company has acquired Recostream, an AI/ML personalized recommendations technology product: read more about this project here . This acquisition aimed to add AI product recommendations to GetResponse’s personalization functionality, further enhancing the platform’s capabilities.

machine learning

TikTok collects a vast amount of data from its users. This data includes user interactions, video information, device information, and more. By analyzing this data, TikTok can understand user preferences and behaviors, allowing it to make more accurate content recommendations.

TikTok’s recommendation system stands out due to its high effectiveness in suggesting content that users find engaging and relevant. This effectiveness is a result of the platform’s advanced algorithms and its ability to analyze vast amounts of data in real-time.

Recommendation system – summary

Recommendation systems are AI-driven engines that provide tailored suggestions to users by analyzing collected data. The recommendation process undergoes four primary stages: data collection, analysis, filtering, and the application of machine learning techniques to produce recommendations. Among the four main types of recommender systems are content-based filtering and collaborative filtering based methods, which often utilize a user-item matrix. The latter, collaborative filtering, particularly focuses on recommendations derived from the preferences of other users. Additionally, hybrid methods and deep learning-based systems offer more advanced recommendation techniques.

Implementing a recommendation system offers businesses numerous advantages, including a personalized user experience (UX), heightened user engagement, and potential revenue growth. These systems are prevalent across various sectors, from ecommerce and entertainment to travel and gaming. When selecting an appropriate recommendation system, businesses can opt for ready-made tools or invest in scalable custom solutions.

Related Posts

  • Best AI for Coding: 10 AI Tools and Assistants for Software Developers for 2024
  • How to Build Recommendation System: Explained Step by Step
  • Can ChatGPT Write Code?
  • Key Benefits of Artificial Intelligence For Modern Businesses
  • Why Is Java Still Popular Even In 2024?

We are Stratoflow, a custom software development company . We firmly believe that software craftsmanship, collaboration and effective communication is key in delivering complex software projects. This allows us to build advanced high-performance Java applications capable of processing vast amounts of data in a short time. We also provide our clients with an option to outsource and hire Java developers to extend their teams with experienced professionals. As a result, our Java software development services contribute to our clients’ business growth. We specialize in travel software , ecommerce software , and fintech software development. In addition, we are taking low-code to a new level with our Open-Source Low-Code Platform .


They have a very good company culture of their own, which gives them a real edge compared to other providers.

Leading UK system integrator

They're very skilled technically and are also able to see the bigger picture.

Managing Partner

Scalable SaaS for healthcare IoT built on Salesforce platform

They've been consistently able to deliver work on time and within budget.

High performance SaaS for financial insitutions

We are seriously impressed by the quality and broader picture of anything they do for us.

Gold trading platform

How to build a recommendation engine with python logo on blended yellow and blue background.

How to Build a Recommendation Engine Using Python

Richard Lawrence

Richard Lawrence ·

Table of Contents


Understanding and responding to user preferences is a significant aspect in today's technology-driven world. Companies such as Netflix, Amazon, and YouTube use recommendation engines to enhance user experience, provide personalized content, and ultimately increase their customer engagement. At the heart of these recommendation engines is machine learning and Python is one of the most frequently used languages for this due to its simplicity and extensive collection of libraries.

This article provides a comprehensive guide on how to build recommendation engines using Python, offering actionable insights for Python developers, data scientists, analysts, students, and researchers interested in this field. We will discuss the fundamental concepts around recommendation systems such as collaborative filtering and content-based filtering as well as the different machine learning algorithms used in these systems. We will also dive into the practical implementation using Python , exploring popular libraries such as Surprise, LensKit, and LightFM among others.

The concepts and processes explained here are not only applicable to digital media platforms but also span various other domains . For instance, an e-commerce platform can provide product suggestions based on users' past purchases or browsing history, a music application can recommend songs aligned with users' tastes, or a news portal can suggest articles catering to users' reading habits.

Whether you are a Python developer planning to broaden your knowledge, a data scientist aiming to leverage recommendation engines in your data analysis work, or a student or researcher seeking to understand and implement recommendation systems in your projects or studies, this comprehensive guide will be your valuable resource. Join us as we explore the techniques and tools involved in creating recommendation engines using Python, and learn how to make systems that can accurately anticipate users' preferences and enhance their overall experience .

Understanding recommendation engines: An overview

Recommendation engines, also known as recommender systems , are a class of machine learning algorithms that play a crucial role in personalized content suggestion. They analyze users' behavior and preferences to predict and rank items or services a user might be interested in. These predictions are made based on various factors, including users' past activities, search history, and demographic information .

There are three main types of recommendation systems :

1. Content-Based Filtering: This approach focuses on the properties of items. Using item features such as author, genre, or director in the context of a movie recommendation system, content-based filtering systems suggest items that are similar to items a user has liked in the past. The assumption here is simply that if a user has liked a certain type of item in the past, they are likely to like such items in the future as well.

Diagram that illustrates content-based filtering.

2. Collaborative Filtering: Collaborative filtering uses the behavior of multiple users to recommend items . It operates under the assumption that if two users agreed in the past , they will likely agree in the future. There are two types of collaborative filtering. User-User Collaborative Filtering , which finds users who are similar to the target user based on similarity of ratings, and recommends items that those similar users liked. And, Item-Item Collaborative Filtering , which instead of taking a user-based approach, takes an item-based perspective and recommends items that are similar based on users' interactions with them.

Diagram illustrating collaborative filtering.

3. Hybrid Systems: These systems combine the strengths of the above two approaches. They might use collaborative filtering to find users who are similar, and then within that user subset, use content-based filtering to find the most suitable items to recommend.

Additionally, while not a distinct type, it's worth noting that Deep Learning-based recommendation systems have also gained significant popularity. They use neural networks to learn from vast amounts of data and make predictions in a way that is harder for traditional systems to accomplish.

All these recommendation engines have their own strengths and weaknesses and are applied according to the needs of the specific application . Some systems might even use combinations of these methods to overcome the limitations of a single approach. Python provides multiple libraries and tools to help developers build, analyze, and deploy these systems efficiently.

Despite their potential, recommendation engines aren't a panacea . Challenges such as the 'cold start' problem - when there is insufficient data about new users or items for the system to provide reliable recommendations - need to be addressed. We will investigate some strategies to overcome this later .

Nonetheless, the power of recommendation engines to improve user experience and promote customer engagement makes them a valuable asset in a wide range of industries .

Exploring the types of recommendation systems: Collaborative filtering and content-based filtering

Digging deeper into the types of recommendation systems, let's unpack the details of two fundamental approaches - collaborative filtering and content-based filtering - which are commonly implemented in Python.

Collaborative Filtering

Collaborative filtering works on the principle of user behavior similarity - 'Users who agreed in the past will agree in the future'. In other words, if two users had similar tastes in the past, it is highly probable they will have similar preferences in the future as well. Collaborative filtering comes in two flavors:

1. User-Based Collaborative Filtering: Also known as user-user collaborative filtering, this approach looks at user behavior and preferences. It analyzes past user-product interactions and assumes that similar users have similar preferences. This method finds users who are similar to the target user based on similarity of ratings, and recommends items that these similar users liked.

2. Item-Based Collaborative Filtering: This method is a bit different in that it proposes recommendations based upon the similarity between items, not users . It considers the set of items that a user has rated, and calculates the likeness between these items and the target item. The items that are deemed most similar are then recommended to the user.

User-item interaction data, such as ratings, is typically stored in a matrix known as the utility matrix . Despite its simplicity, collaborative filtering can be quite effective.

Content-Based Filtering

Content-based filtering, on the other hand, focuses on the properties or features of items . It recommends items by comparing the content of the items and a user profile . For instance, if a user has shown interest in a specific genre of movies, the system will recommend other movies from the same genre.

To create an item profile, properties such as actors, directors, and genres are considered. For users, a profile is created using a utility matrix that describes the relationship between users and items. The system then compares these profiles to generate recommendations.

Though content-based filtering systems offer more personalized recommendations, they may limit the diversity of the recommendations as they are solely based on user's past behavior.

In the next section, we will discuss how Python, with its rich ecosystem of libraries and packages, can be used to implement these types of recommendation systems , walking you through some practical code examples.

Introduction to Machine Learning Algorithms for Recommendation Engines

Machine Learning algorithms form the backbone of recommendation engines. They facilitate the task of analyzing massive datasets, identifying patterns and correlations, and ultimately enabling the system to learn from past user behavior and interactions to make accurate predictions and recommendations. The choice of algorithm depends on the type of recommendation approach used: collaborative filtering, content-based filtering, or a hybrid of the two.

Let's explore some of these algorithms commonly used in recommendation systems:

1. Matrix Factorization: This is an extensively used technique in recommendation systems, specifically in collaborative filtering . Matrix factorization algorithms work by 'decomposing' the user-item interaction matrix into the product of two lower dimensional matrices . This way, they can capture the underlying factors or features that led to a particular user-item interaction. One popular algorithm that uses matrix factorization is Singular Value Decomposition (SVD) . Implementing this with Python’s Surprise package is a common approach to build recommendation systems based on rating data.

2. Association Rules Learning: This method is based on the concept of 'if this, then that' , providing insights into the associations and correlations between different item sets. The Apriori algorithm , which is used to learn these association rules, has been applied in recommendation systems. This algorithm can be used to find items that get bought together frequently , and then suggest them to users who have bought one of those items.

3. Cosine Similarity: Typically used in content-based filtering , cosine similarity compares item features to generate a similarity score, which is used for recommendations. This technique measures the cosine of the angle between two vectors in a multidimensional space. These vectors can represent items or users, and the similarity score can indicate how alike two items/users are based on their characteristics or behavior.

4. Collaborative Filtering Algorithms: Memory-Based Collaborative Filtering uses user behavior for recommendations and includes algorithms like user-user and item-item collaborative filtering. On the other hand, Model-Based Collaborative Filtering uses machine learning models to predict user ratings for unrated items. This category includes algorithms like K-Nearest Neighbors (KNN) , which finds the most similar items/users to make recommendations, and Neural networks and Deep learning based models that can handle vast amounts of data and complex features.

5. Content-Based Algorithms: These algorithms use item features to recommend other items similar to what the user has liked in the past. Techniques like TF-IDF (Term Frequency-Inverse Document Frequency) can be used to convert text into numerical data, and then cosine similarity or other similarity measures can be used to find the most similar items.

It's worth noting that the above list is not exhaustive. There are many other machine learning algorithms and methods used in recommendation systems, and the choice largely depends on the nature of the problem, the available data, and the desired outcome. Despite the variety, Python offers numerous libraries and tools that make it easier to implement these algorithms effectively and efficiently. I n the following sections, we will delve into some of these Python resources that can aid in building robust recommendation systems.

Getting started with Python libraries for recommendation systems

Python, with its rich ecosystem of libraries and packages, lends itself perfectly to building recommendation systems. These libraries provide essential tools and algorithms that significantly simplify the process of developing, deploying, and evaluating recommendation models. Let's take a look at some of these libraries and how they can be used in the context of recommendation engines:

1. Surprise (Scikit-learn for recommender systems) : Surprise is a Python library that provides various ready-to-use prediction algorithms such as singular value decomposition (SVD) , K-Nearest Neighbours (KNN) , among others. It also provides tools for evaluating, analyzing, and comparing the performance of different algorithms. The simplicity of its API makes it a popular choice among both beginners and experienced developers.

2. LightFM : LightFM is another powerful Python library for building recommendation systems. It stands out by supporting both collaborative and content-based filtering, allowing for the creation of hybrid models. The library is designed to handle both explicit and implicit feedback, making it highly adaptable to different types of recommendation scenarios.

3. LensKit : LensKit is a set of tools and libraries that make it easier to experiment with recommendation algorithms. It provides functionalities for numerous collaborative and content-based filtering algorithms , along with tools for evaluation and analysis. Its modular design allows for easy extension and customization of existing algorithms.

4. fastFM : fastFM is a library that offers efficient implementation of factorization machines , a class of models that is highly effective for recommendation systems. fastFM provides functionality for regression, binary classification, and ranking tasks. Its ability to handle high-dimensional sparse data makes it a powerful tool when working with large datasets.

5. TensorRec: TensorRec is a recommendation framework built in Python with the power of TensorFlow . It allows developers to create complex recommendation models without having to write a lot of custom code. TensorRec's flexible framework can accommodate both collaborative and content-based models , and can even implement hybrid models.

6. Rexy : Rexy is an open-source recommendation engine which simplifies the development of both collaborative filtering and content-based recommendation systems. It provides a straightforward API and a robust set of features, including support for custom recommendation algorithm s.

When selecting a library for your recommendation engine project, consider factors such as the nature of your data, the type of recommendation system you are building, and the level of customization you require . If you need to handle complex scenarios or require more control over your model, a more flexible library like TensorRec or fastFM may be suitable. Conversely, if you are looking for simplicity and ease-of-use, libraries like Surprise or LightFM could be the better choice.

Regardless of the library you choose, remember that the goal is to develop a recommendation system that successfully enhances user experience by providing personalized and relevant recommendations. The power of Python libraries coupled with the right approach can help you create robust and efficient recommendation systems that cater to your unique requirements.

Building a basic recommendation engine with Python: A step-by-step guide

Building a recommendation engine may seem like a daunting task, but Python, with its rich set of libraries, simplifies the process and breaks it down into digestible chunks. Let's build a simple recommendation system using Python. For this step-by-step tutorial, we'll be using the `Surprise` library, which provides tools and algorithms for building recommendation systems.

Our recommendation engine will be based on collaborative filtering and will use the Singular Value Decomposition (SVD) algorithm , a popular choice for recommendation systems.

Before we dive in, make sure that you have the `surprise` library installed in your Python environment. If not, you can install it using pip.

Step 1: Import the necessary libraries

The first step is to import the necessary libraries.

Step 2: Load the dataset

Next, we'll load our dataset. `Surprise` provides a few built-in datasets. For our exercise, we're using the widely used MovieLens dataset.

Step 3: Define the algorithm

Now, we need to define which algorithm we'll use. As mentioned, we're going to use the `SVD` algorithm.

Step 4: Fit and evaluate the model

We're now ready to fit and evaluate our model. We'll use 5-fold cross-validation, which means the `cross_validate` function will split the dataset into 5 parts, train the model on 4 parts, and evaluate it on the remaining part. This process is repeated 5 times so that every part of the dataset is used for evaluation.

The metrics used here are Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) . These metrics give us an idea of how much our predictions deviate, on average, from the actual ratings in the dataset.

Once you run the code, you'll get an output with the test scores, which include the RMSE and MAE for each of the 5-fold cross-validation, as well as their average and the total time of computation.

Step 5: Making Predictions

With your model trained and evaluated, you can now make predictions for specific users and items.

In the above example, the user with ID 196 and the item with ID 302 are used for the prediction. 'r_ui' is the true rating. 'verbose = True' means the function will print its results.

And there you have it—a basic movie recommendation engine using Python and the Surprise library. This tutorial should serve as a launching pad for your exploration of recommendation engines using Python, as you delve deeper into more complex models, different algorithms, and larger datasets. The opportunities for refinement and adjustment are vast, ensuring that your recommendation engine can evolve as your understanding and requirements grow .

Improving your recommendation engine: Tips and best practices

Building a basic recommendation engine is certainly a notable achievement. However, if you wish to take your system to the next level and enhance its performance, it's essential to employ a range of strategies, techniques, and best practices .

1. Expand your Dataset: One surefire way to improve your recommendation engine is by expanding your dataset. More data equals more information for your system to learn from. Consider incorporating additional user behaviors and item attributes, or even consider merging multiple datasets. This provides richer insight and can significantly improve recommendation quality.

2. Experiment with Different Algorithms: As we've discussed earlier, there's a variety of machine learning algorithms that can be used in recommendation engines. Trying different ones and comparing their performance can help identify the algorithm that works best for your specific scenario. Don't be afraid to experiment with different collaborative and content-based filtering methods, or even hybrid approaches.

3. Address the Cold Start Problem: The cold start problem - the challenge of making accurate recommendations for new users or items that have little to no interaction data - is a common hurdle in recommendation systems. One strategy to mitigate this is to use content-based filtering for new users or items , utilizing the information available about the user or the item to provide initial recommendations. For instance, for a new user, you could use demographic information or explicit feedback (like a short survey) to generate initial recommendations.

4. Implement a Popularity Filter: While personalized recommendations are the end goal, sometimes, popular items can be a safe bet, particularly for new users. Implementing a popularity filter, which suggests items that are currently trending or most-rated , could be a helpful supplement to your recommendation system.

5. Consider the Context: User preferences can be dependent on certain contexts such as time, location, or mood. By incorporating context into your recommendation engine, you can make your suggestions more relevant and timely. A user might prefer watching different genres of movies in the morning versus late at night, for example. Understanding and incorporating such context-dependent preferences can significantly improve your recommendations.

6. Constantly Evaluate and Update the model: Regularly evaluating your model with different metrics and updating it based on the latest user-item interactions can help keep your recommendations fresh and relevant. It's also worth regularly retraining your model as new data comes in, to ensure that it learns from the most recent user behaviors.

7. Personalize Recommendations Further: There are ways to go beyond the basic collaborative and content-based filtering techniques for more personalized recommendations. One such method is using Deep Learning algorithms. Deep Learning can discover intricate structures within the data and model complicated non-linear relationships. This can be particularly useful for large-scale and complex recommendation tasks.

Remember, the goal of a recommendation system is not only to predict user preferences accurately but also to increase user satisfaction , diversity, and the novelty of recommendations. Balancing accuracy with these other factors is key to building successful recommendation systems. While Python and its vast selection of libraries significantly simplify the development process, continuous improvement and fine-tuning of your model based on feedback and performance will go a long way in improving the reliability of your recommendations and the satisfaction of your users.

In conclusion, recommendation engines are powerful tools in today's digital landscape, assisting in providing a personalized user experience and elevating customer engagement. Python, with its easy-to-use syntax and rich collection of libraries and tools, is an excellent language for building these systems.

This comprehensive guide explores the fundamental concepts of recommendation systems , highlights the different machine learning algorithms used, and walks you through the process of building and refining a basic recommendation engine using Python. We've also discussed several strategies to enhance the performance of your recommendation systems , including expanding your dataset, experimenting with different algorithms, addressing the cold start problem, adding a popularity filter, considering user context, and continuously evaluating and updating your model.

The journey doesn't stop here. The field of recommendation engines is continuously evolving, and as a Python developer, data scientist, or researcher, there are always new techniques, tools, and methods to explore and implement. Continue to learn, experiment, and evolve your knowledge and skills to build robust, efficient, and effective recommendation engines that truly enhance user experiences.

About Richard Lawrence

How to Build a Movie Recommendation System Based on Collaborative Filtering

In today’s world of technology, we get more recommendations from Artificial Intelligence models than from our friends.

Surprised? Think of the content you see and the apps you use daily. We get product recommendations on Amazon, clothing recommendations on Myntra, and movie suggestions on Netflix based on our past preferences, purchases, and so on.

Have you ever wondered what’s under the hood? The answer is machine learning-powered Recommender systems. Recommender systems are machine learning algorithms developed using historical data and social media information to find products personalized to our preferences.

In this article, I’ll walk you through the different types of ML methods for building a recommendation system and focus on the collaborative filtering method . We will obtain a sample dataset and create a collaborative filtering recommender system step by step.

Make sure to grab a cup of cappuccino (or whatever is your beverage of choice) and get ready!


Before we embark on this journey, you should have a basic understanding of machine learning concepts and familiarity with Python programming. Knowledge of data processing and experience with libraries like Pandas, NumPy, and Scikit-learn will also be beneficial.

If you're new to these topics, you can check out the Introduction to Data Science course on Hyperskill, where I contribute as an expert.

Different Types of Recommendation Systems

You'll probably agree that there is more than one way to decide what to suggest or recommend when a friend asks our opinion. This applies to AI, too!

In machine learning, two primary methods of building recommendation engines are Content-based and Collaborative filtering methods.

When using the content-based filter method, the suggested products or items are based on what you liked or purchased. This method feeds the machine learning model with historical data such as customer search history, purchase records, and items in their wishlists. The model finds other products that share features similar to your past preferences.

Let’s understand this better with an example of a movie recommendation. Let’s say you saw Inception and gave it a five-star rating. Finding movies of similar themes and genres, like Interstellar and Matrix, and recommending them is called content-based filtering.

Imagine if all the recommendation systems just suggested things based on what you have seen. How would you discover new genres and movies? That’s where the Collaborative filtering method comes in. So what is it? Rather than finding similar content, the Collaborative filtering method finds other users and customers similar to you and recommends their choices. The algorithm doesn’t consider the product features as in the case of content-based filtering.

To understand how it works, let’s go back to our example of movie recommendations. The system looks at the movies you've enjoyed and finds other users who liked the same movies. Then, it sees what else these similar users enjoyed and suggests those movies to you.

For example, if you and a friend both love The Shawshank Redemption, and your friend also loves Forrest Gump, the system will recommend Forrest Gump to you, thinking you might share your friend's taste.

In the upcoming sections, I’ll show you how to build a movie recommendation engine using Python based on collaborative filtering.


How to Prepare and Process the Movies Dataset

The first step of any machine learning project is collecting and preparing the data. As our goal is to build a movie recommendation engine, I have chosen a movie rating dataset. The dataset is publicly available for free on Kaggle .

The dataset has two main files in the format of CSV:

  • Ratings.csv : Contains the rating given by each user to each movie they watched
  • Movies_metadata.csv : Contains information on genre, budget, release date, revenue, and so on for all the movies in the dataset.

Let’s first import the Python packages needed to read the CSV files.

Next, read the Ratings file into Pandas dataframes and look at the columns.


The UserId column has the unique ID for every customer, and movieId has the unique identification number for every movie. The rating column contains the rating given by the particular user to the movie out of 5. The timestamp column can be dropped, as we won’t need it for our analysis.

Next, let’s read the movie metadata information into a dataframe. Let’s keep only the relevant columns of Movie Title and genre for each MovieID.


Next, combine these dataframes on the common column movieID .

This dataset can be used for Exploratory Data Analysis. You can find the movie with the top number of ratings, the best rating, and so on. Try it out to better grasp the data you are dealing with.

How to Build the User-Item Matrix

Now that our dataset is ready, let's focus on how collaborative-based filtering works. The machine learning algorithm aims to discover user preference patterns used to make recommendations.

One common approach is to use a user-item matrix . It involves a large spreadsheet where users are listed on one side and movies on the other. Each cell in the spreadsheet shows if a user likes a particular movie. The system then uses various algorithms to analyze this matrix, find patterns, and generate recommendations.

This matrix leads us to one of the advantages of collaborative filtering: it's excellent at discovering new and unexpected recommendations. Since it's based on user behavior, it can suggest a movie you might never have considered but will probably like.

Let’s create a user-movie rating matrix for our dataset. You can do this using the built-in pivot function of a Pandas dataframe, as shown below. We also use the fillna() method to impute missing or null values with 0.

Here’s our output matrix:


Sometimes, the matrix can be sparse. Sparsity refers to null values. It could significantly increase the amount of computation resources needed. Compressing the sparse matrixes using the scipy Python package is recommended when working with a large dataset.

How to Define and Train the Model

You can use multiple machine learning algorithms for collaborative filtering, like K-nearest neighbors (KNN) and SVD . I’ll be using a KNN model here.

KNN is super straightforward. Picture a giant, colorful board with dots representing different items (like movies). Each dot is close to others that are similar. When you ask KNN for recommendations, it finds the spot of your favorite item on this board and then looks around to see the nearest dots—these are your recommendations.

Now, the metric parameter in KNN is crucial. It's like the ruler the system uses to measure the distance between these dots. The metric used here is Cosine similarity.

What is cosine similarity?

It is a metric that measures how similar two entities are (like documents or vectors in a multi-dimensional space), irrespective of size. Cosine similarity is widely used in NLP to find similar context words. Follow the snippet below to define a KNN model, the metric, and other parameters. The model is fit on the user-item matrix created in the previous section.

Next, let's define a function to provide the desired number of movie recommendations, given a movie title as input. The code below finds the closest neighbor data, and points to the input movie name using the KNN algorithm. The input parameters for the function are:

  • n_recs : Controls the number of final recommendations that we would get as output
  • Movie_name : Input movie name, based on which we find new recommendations
  • Matrix : The User-Movie Rating matrix

How to Get Recommendations from the Model

Let's call our defined function to get movie recommendations. For instance, we can obtain a list of the top 10 recommended movies for someone who is a fan of Batman.


Hurray! We have got the result we needed.

Advantages and Limitations of Collaborative Filtering

The advantages of this method include:

  • Personalized Recommendations: Offers tailored suggestions based on user behavior, leading to highly customized experiences.
  • Diverse Content Discovery: Capable of recommending a wide range of items, helping users discover content they might not find on their own. It gives diverse content discovery the edge over content-based filtering.
  • Community Wisdom: Leverages the collective preferences of users, often leading to more accurate recommendations than individual or content-based analysis alone.
  • Dynamic Adaptation: The model continuously gets updated with user interactions, keeping the recommendations relevant and up-to-date.

It’s not all sunshine, though. One big challenge is the cold start problem. For example, this happens when new movies or users are added to the system. The system struggles to make accurate recommendations since there's not enough data on these new entries.

Another issue is popularity bias. Popular movies get recommended a lot, overshadowing lesser-known gems. There are also scalability issues that come with managing such a large dataset.

While developing collaborative filtering-based engines, computational expenses and data sparsity must be kept in mind for an efficient process. It’s also recommended to take action to ensure data privacy and security.

Using Collaborative Filtering to build a movie recommendation system significantly advances digital content personalization. This system reflects our preferences and exposes us to a broader range of choices based on similar users' tastes.

Despite its challenges, such as the cold start problem and popularity bias, the benefits of personalized recommendations make it a powerful tool in the machine learning industry. As technology advances, these systems will become even more sophisticated, offering refined and enjoyable user experiences in the digital world.

Thank you for reading! I'm Jess, and I'm an expert at Hyperskill. You can check out an Introduction to Data Science course on the platform.

👩‍💻 Software developer from Boston

If this article was helpful, share it .

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

  • Engineering
  • Infrastructure

Open source

Profile Image

Twitter's Recommendation Algorithm

Twitter aims to deliver you the best of what’s happening in the world right now. This requires a recommendation algorithm to distill the roughly 500 million Tweets posted daily down to a handful of top Tweets that ultimately show up on your device’s For You timeline. This blog is an introduction to how the algorithm selects Tweets for your timeline.

Our recommendation system is composed of many interconnected services and jobs, which we will detail in this post. While there are many areas of the app where Tweets are recommended—Search, Explore, Ads—this post will focus on the home timeline’s For You feed.

How do we choose Tweets?

The foundation of Twitter’s recommendations is a set of core models and features that extract latent information from Tweet, user, and engagement data. These models aim to answer important questions about the Twitter network, such as, “What is the probability you will interact with another user in the future?” or, “What are the communities on Twitter and what are trending Tweets within them?” Answering these questions accurately enables Twitter to deliver more relevant recommendations.

The recommendation pipeline is made up of three main stages that consume these features: 

  • Fetch the best Tweets from different recommendation sources in a process called candidate sourcing .
  • Rank each Tweet using a machine learning model.
  • Apply heuristics and filters , such as filtering out Tweets from users you’ve blocked, NSFW content, and Tweets you’ve already seen.

The service that is responsible for constructing and serving the For You timeline is called Home Mixer. Home Mixer is built on Product Mixer, our custom Scala framework that facilitates building feeds of content. This service acts as the software backbone that connects different candidate sources, scoring functions, heuristics, and filters.

This diagram below illustrates the major components used to construct a timeline:

how to create recommendation engine

Let’s explore the key parts of this system, roughly in the order they’d be called during a single timeline request, starting with retrieving candidates from Candidate Sources .

Candidate Sources

Twitter has several Candidate Sources that we use to retrieve recent and relevant Tweets for a user. For each request, we attempt to extract the best 1500 Tweets from a pool of hundreds of millions through these sources. We find candidates from people you follow ( In-Network ) and from people you don’t follow ( Out-of-Network ). Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user.

In-Network Source

The In-Network source is the largest candidate source and aims to deliver the most relevant, recent Tweets from users you follow. It efficiently ranks Tweets of those you follow based on their relevance using a logistic regression model. The top Tweets are then sent to the next stage.

The most important component in ranking In-Network Tweets is Real Graph . Real Graph is a model which predicts the likelihood of engagement between two users. The higher the Real Graph score between you and the author of the Tweet, the more of their tweets we'll include.

The In-Network source has been the subject of recent work at Twitter. We recently stopped using Fanout Service, a 12-year old service that was previously used to provide In-Network Tweets from a cache of Tweets for each user. We’re also in the process of redesigning the logistic regression ranking model which was last updated and trained several years ago!

Out-of-Network Sources

Finding relevant Tweets outside of a user’s network is a trickier problem: How can we tell if a certain Tweet will be relevant to you if you don’t follow the author? Twitter takes two approaches to addressing this.

Social Graph

Our first approach is to estimate what you would find relevant by analyzing the engagements of people you follow or those with similar interests.

We traverse the graph of engagements and follows to answer the following questions:

  • What Tweets did the people I follow recently engage with?
  • Who likes similar Tweets to me, and what else have they recently liked?

We generate candidate Tweets based on the answers to these questions and rank the resulting Tweets using a logistic regression model. Graph traversals of this type are essential to our Out-of-Network recommendations; we developed GraphJet , a graph processing engine that maintains a real-time interaction graph between users and Tweets, to execute these traversals. While such heuristics for searching the Twitter engagement and follow network have proven useful (these currently serve about 15% of Home Timeline Tweets), embedding space approaches have become the larger source of Out-of-Network Tweets.

Embedding Spaces

Embedding space approaches aim to answer a more general question about content similarity: What Tweets and Users are similar to my interests?

Embeddings work by generating numerical representations of users’ interests and Tweets’ content. We can then calculate the similarity between any two users, Tweets or user-Tweet pairs in this embedding space. Provided we generate accurate embeddings, we can use this similarity as a stand-in for relevance.

One of Twitter’s most useful embedding spaces is SimClusters . SimClusters discover communities anchored by a cluster of influential users using a custom matrix factorization algorithm . There are 145k communities, which are updated every three weeks. Users and Tweets are represented in the space of communities, and can belong to multiple communities. Communities range in size from a few thousand users for individual friend groups, to hundreds of millions of users for news or pop culture. These are some of the biggest communities:

how to create recommendation engine

We can embed Tweets into these communities by looking at the current popularity of a Tweet in each community. The more that users from a community like a Tweet, the more that Tweet will be associated with that community.

The goal of the For You timeline is to serve you relevant Tweets. At this point in the pipeline, we have ~1500 candidates that may be relevant. Scoring directly predicts the relevance of each candidate Tweet and is the primary signal for ranking Tweets on your timeline. At this stage, all candidates are treated equally, without regard for what candidate source it originated from.

Ranking is achieved with a ~48M parameter neural network that is continuously trained on Tweet interactions to optimize for positive engagement (e.g. Likes, Retweets, and Replies). This ranking mechanism takes into account thousands of features and outputs ten labels to give each Tweet a score, where each label represents the probability of an engagement. We rank the Tweets from these scores. 

Heuristics, Filters, and Product Features

After the Ranking stage, we apply heuristics and filters to implement various product features. These features work together to create a balanced and diverse feed. Some examples include:

  • Visibility Filtering : Filter out Tweets based on their content and your preferences. For instance, remove Tweets from accounts you block or mute.  
  • Author Diversity : Avoid too many consecutive Tweets from a single author.
  • Content Balance : Ensure we are delivering a fair balance of In-Network and Out-of-Network Tweets.
  • Feedback-based Fatigue : Lower the score of certain Tweets if the viewer has provided negative feedback around it.
  • Social Proof : Exclude Out-of-Network Tweets without a second degree connection to the Tweet as a quality safeguard. In other words, ensure someone you follow engaged with the Tweet or follows the Tweet’s author.
  • Conversations : Provide more context to a Reply by threading it together with the original Tweet.
  • Edited Tweets : Determine if the Tweets currently on a device are stale, and send instructions to replace them with the edited versions.

Mixing and Serving

At this point, Home Mixer has a set of Tweets ready to send to your device. As the last step in the process, the system blends together Tweets with other non-Tweet content like Ads, Follow Recommendations, and Onboarding prompts, which are returned to your device to display. 

The pipeline above runs approximately 5 billion times per day and completes in under 1.5 seconds on average. A single pipeline execution requires 220 seconds of CPU time, nearly 150x the latency you perceive on the app.

how to create recommendation engine

The goal of our open source endeavor is to provide full transparency to you, our users, about how our systems work. We’ve released the code powering our recommendations that you can view here  (and here ) to understand our algorithm in greater detail, and we are also working on several features to provide you greater transparency within our app. Some of the new developments we have planned include: 

  • A better Twitter analytics platform for creators with more information on reach and engagement
  • Greater transparency into any safety labels applied to your Tweets or accounts
  • Greater visibility into why Tweets appear on your timeline

What’s Next?

Twitter is the center of conversations around the world. Every day, we serve over 150 billion Tweets to people’s devices. Ensuring that we’re delivering the best content possible to our users is both a challenging and an exciting problem. We’re working on new opportunities to expand our recommendation systems—new real-time features, embeddings, and user representations—and we have one of the most interesting datasets and user bases in the world to do it with. We are building the town square of the future. If this interests you, please consider joining us .

Written by the Twitter Team.

  • open source

Did someone say … cookies? X and its partners use cookies to provide you with a better, safer and faster service and to support our business. Some cookies are necessary to use our services, improve our services, and make sure they work properly. Show more about your choices .

  • Accept all cookies
  • Refuse non-essential cookies
  • Do Not Sell My Personal Info

Get Your Report

  •  ⋅ 
  • Local Search

Google Maps Launches AI-Powered Local Business Search

Google introduces AI in Maps to suggest personalized local business recommendations.

  • Google Maps launched an AI feature for personalized local business recommendations.
  • It uses large language models to analyze Maps data and suggest places based on user interests.
  • The technology aims to enhance local discovery and exploration in Google Maps.

Google is introducing an experimental feature in Google Maps that uses AI to help users discover local businesses that meet specific needs.

AI-Powered Discovery

The new feature utilizes large language models to analyze Google Maps’ database of over 250 million places, photos, ratings, reviews, and more.

After entering a conversational search query, Google Maps will suggest personalized recommendations for businesses, events, restaurants, and activities in the area.

For example, you can ask Maps to recommend “places with a vintage vibe in San Francisco,” and it will return suggestions like clothing boutiques, record stores, and flea markets.

The results are categorized with photos and review highlights to explain why they meet your criteria.

You can refine your search by asking follow-up questions like “How about lunch?,” which will return recommendations for eateries with a vintage ambiance. Suggested places can also be saved into lists for future reference.

According to Google, the technology is helpful for managing spontaneous or changing itineraries. You can ask for “activities for a rainy day” and immediately get indoor options tailored to the current weather and location.

The feature also takes group dynamics into account. Families can request “options for kids” to see curated suggestions for child-friendly places like children’s museums, arcades, and indoor playgrounds.

Early Access Experiment With Local Guides

For this early preview, Google is soliciting feedback from a select group of Local Guides. Their input will help shape the AI technology before a wider rollout.

The launch represents Google’s latest effort to integrate generative AI into Maps and transform how users find and explore local businesses. By combining large language models with Maps’ expansive database, Google aims to provide ultra-personalized recommendations to match any need or interest.

Implications For Local Search

The implications for local search and customer discovery could be significant, potentially driving more qualified traffic to niche businesses or lesser-known attractions and events.

As Google continues honing its AI capabilities, businesses may need to optimize online information in new ways to rank for conversational searches and take advantage of the technology.

Featured Image: Screenshot from, February 2024. 

Matt G. Southern, Senior News Writer, has been with Search Engine Journal since 2013. With a bachelor’s degree in communications, ...

Subscribe To Our Newsletter.

Conquer your day with daily search marketing news.


  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

How Machine Learning Will Transform Supply Chain Management

  • Narendra Agrawal,
  • Morris A. Cohen,
  • Rohan Deshpande,
  • Vinayak Deshpande

how to create recommendation engine

Businesses need better planning to make their supply chains more agile and resilient. After explaining the shortcomings of traditional planning systems, the authors describe their new approach, optimal machine learning (OML), which has proved effective in a range of industries. A central feature is its decision-support engine that can process a vast amount of historical and current supply-and-demand data, take into account a company’s priorities, and rapidly produce recommendations for ideal production quantities, shipping arrangements, and so on. The authors explain the underpinnings of OML and provide concrete examples of how two large companies implemented it and improved their supply chains’ performance.

It does a better job of using data and forecasts to make decisions.

Idea in Brief

The problem.

Flawed planning methods make it extremely difficult for companies to protect themselves against supply chain disruptions.

A new approach, called optimal machine learning (OML), can enable better decisions, without the mystery surrounding the planning recommendations produced by current machine-learning models.

The Elements

OML relies on a decision-support engine that connects input data directly to supply chain decisions and takes into account a firm’s performance priorities. Other features are a “digital twin” representation of the entire supply chain and a data storage system that integrates information throughout the supply chain and allows for quick data access and updating.

The Covid-19 pandemic, the Russia-Ukraine conflict, trade wars, and other events in recent years have disrupted supply chains and highlighted the critical need for businesses to improve planning in order to be more agile and resilient. Yet companies struggle with this challenge. One major cause is flawed forecasting, which results in delivery delays, inventory levels that are woefully out of sync with demand, and disappointing financial performance. Those consequences are hardly surprising. After all, how can inventory and production decisions be made effectively when demand forecasts are widely off?

  • Narendra Agrawal is the Benjamin and Mae Swig Professor of Information Systems and Analytics at Santa Clara University’s Leavey School of Business.
  • Morris A. Cohen is the Panasonic Professor Emeritus of Manufacturing & Logistics at the University of Pennsylvania’s Wharton School. He is also the founder of AD3 Analytics, a start-up that developed the OML methodology for supply chain management.
  • Rohan Deshpande is a machine learning scientist at Cerebras Systems and a former chief technology officer at AD3 Analytics.
  • Vinayak Deshpande is the Mann Family Distinguished Professor of Operations at the University of North Carolina’s Kenan-Flagler Business School.

Partner Center


  1. How Do Recommendation Engines Work? What are the Benefits?

    how to create recommendation engine

  2. Build your Recommendation Engine with Python

    how to create recommendation engine

  3. Build a Recommendation Engine With Collaborative Filtering

    how to create recommendation engine

  4. Develop a Recommendation Engine to Predict Future Customer Purchases

    how to create recommendation engine

  5. How To Build a Recommendation Engine in Python

    how to create recommendation engine

  6. How to build a Recommendation Engine quick and simple

    how to create recommendation engine


  1. F150 Recommendation Engine And Transmission

  2. Recommendation Engine And Its Application RL

  3. Machine Learning Interview Question

  4. Recommender System Project Pacmann




  1. How to Build Recommendation System: Explained Step by Step

    How to Build Recommendation System: Explained Step by Step Ever wondered how Netflix knows exactly what you want to watch? Or how online shopping sites always seem to suggest just the right products? The secret lies in the power of recommendation engines! In this article, we'll reveal the magic behind these systems.

  2. Building a Recommendation Engine: An Algorithm Tutorial

    Sets and Equations Before implementing a collaborative memory-based recommendation engine, we must first understand the core idea behind such a system. To this engine, each item and each user is nothing but identifiers.

  3. How to build a Recommendation Engine quick and simple

    · Apr 6, 2019 -- 5 Recreation of Rina Piccolo's cartoon (Cartoonist Group) This article is meant to be a light introduction to the topic and provide the first steps to get you into production with a recommendation engine in a week. I will also tell you what the steps are to go from basics to fairly advanced.

  4. Recommendation Engine: What It Is, How It Works

    UPDATED BY Brennan Whitfield | Oct 10, 2023 A recommendation engine, or recommender system, is a data filtering tool that provides personalized suggestions to users based on their past behavior and preferences.

  5. Build a Recommendation Engine With Collaborative Filtering

    To try out this recommender, you need to create a Trainset from data. Trainset is built using the same data but contains more information about the data, such as the number of users and items (n_users, n_items) that are used by the algorithm. You can create it either by using the entire data or a part of the data.

  6. Building Recommendation Engines (3-Step Guide)

    To begin building recommendation engines, you need interconnected data in a graph database such as Neo4j. Though not required, storing your data in this way enables you to traverse your data much more efficiently particularly at scale, enabling lightning fast insights by avoiding the expensive joins that would be required in a relational database.

  7. Recommendation System Series Part 1: An Executive Guide to Building

    Recommendation systems should also be agile. That is, adaptable and able to evolve as users do. Putting a recommendation system into production isn't the final step in the process; rather, it's an ongoing evolution, looking at what works, what doesn't, thinking about additional data sources that might help make better recommendations, etc.

  8. Comprehensive Guide to build Recommendation Engine from scratch

    1. What are recommendation engines? Till recently, people generally tended to buy products recommended to them by their friends or the people they trust. This used to be the primary method of purchase when there was any doubt about the product.

  9. How To Build A Good Recommender System Algorithm

    Factors that make a "good" recommender system. A recommender system performs well if it strikes the right balance between randomness and specificity — if a recommender system suggests something that's too specific, it can feel off-putting to a user. When you're building a recommender system or imagining how you might implement one, you have to keep these factors in mind.

  10. What Is a Recommendation Engine? How Recommenders Work

    Key takeaways Recommendation engines are advanced data filtering systems that use behavioral data, computer learning, and statistical modeling to predict the content, product, or services customers will like. Customers are drawn to businesses that offer personalized experiences.

  11. How to build an Advanced Recommendation Engine

    How to build a Recommendation Engine quick and simple Part 1: an introduction, how to get to production in a week and where to go after that YOUR path towards version 2 Congratulations, after you deployed version 1 of your recommendation engine it's time to celebrate your achievement!

  12. Creating a recommendation engine using Amazon Personalize

    Overview The most daunting aspect of building a recommendation engine is knowing where to start. This is even more difficult when you have limited or little experience with ML. However, you may be lucky enough to know what you don't know (and what you should figure out), such as: What data to use. How to structure it.

  13. Machine learning: How to create a recommendation engine

    Recommendation engines are at the heart of the central feedback loop of social networks and the user-generated content (UGC) they create. Users join the network and are recommended users and...

  14. Beginner Tutorial: Recommender Systems in Python

    Build your recommendation engine with the help of Python, from basic models to content-based and collaborative filtering recommender systems. May 2020 · 26 min read. Share. Source</a . The purpose of this tutorial is not to make you an expert in building recommender system models. Instead, the motive is to get you started by giving you an ...

  15. Beginner's guide to build Recommendation Engine in Python

    May 4, 2020 -- 1 Source : Introduction A while ago whenever we bought a specific product, it was probably recommended by our friends or trusted persons. But now the...

  16. Building Recommendation Engines using Pandas

    At first, we will import the pandas library of python with the help of which we will create the Recommendation Engine. Then we loaded the datasets from the given path in the code below and added the column names to it. Python3 import pandas as pd col_names = ['user_id', 'item_id', 'rating', 'timestamp']

  17. Guide to Recommendation System: Types, Selection Criteria, How to Build

    A recommendation engine is an AI-driven system that generates personalized suggestions to users based on collected data. The recommendation process consists of 4 main steps: collecting, analyzing, and filtering data, and then generating recommendations using machine learning techniques. There are 4 main types of recommender systems that use ...

  18. How to build a content recommendation engine with Snowplow

    Such a recommendation engine will recommend an item to a user based on the interests of similar users. At this point, it is worth noting that, under the scope of the current and emerging privacy regulations and public awareness, the approaches to user similarity may need to be reconsidered. Until now, the user characteristics making the ...

  19. How to Build a Recommendation Engine Using Python

    4. Implement a Popularity Filter: While personalized recommendations are the end goal, sometimes, popular items can be a safe bet, particularly for new users. Implementing a popularity filter, which suggests items that are currently trending or most-rated, could be a helpful supplement to your recommendation system. 5.

  20. How to Build a Movie Recommendation System Based on Collaborative Filtering

    The code below finds the closest neighbor data, and points to the input movie name using the KNN algorithm. The input parameters for the function are: n_recs: Controls the number of final recommendations that we would get as output. Movie_name: Input movie name, based on which we find new recommendations.

  21. Guide for recommendation engine development and deployment for ...

    Recommendation Engine Indication. Although accuracy can be used to identify whether the recommendation correctly make prediction, accuracy is not a best indicator for business and customer ...

  22. How to create recommendation engine in neo4j

    From zero to recommendation engine in Neo4j Neo4j also provides the infrastructure to build recommendation algorithms. In particular, my experience focuses on two main groups: Content-Based ...

  23. How Build A Movie Recommendation System Using Python

    In this video, we are going to cover how to build a movie recommendation system using Python. This video will help you to understand what is a movie recommen...

  24. Twitter's Recommendation Algorithm

    Twitter aims to deliver you the best of what's happening in the world right now. This requires a recommendation algorithm to distill the roughly 500 million Tweets posted daily down to a handful of top Tweets that ultimately show up on your device's For You timeline. This blog is an introduction to how the algorithm selects Tweets for your timeline.

  25. Google Maps Launches AI-Powered Local Business Search

    Google introduces AI in Maps to suggest personalized local business recommendations. Google Maps launched an AI feature for personalized local business recommendations.

  26. How Machine Learning Will Transform Supply Chain Management

    The Problem. Flawed planning methods make it extremely difficult for companies to protect themselves against supply chain disruptions. A Remedy. A new approach, called optimal machine learning ...