The TikTok Algorithm

algorithmically deriving your personality

Nov 09, 2021

TikTok

Since its initial release in 2016, TikTok has grown to over 900 million daily active users in 2021. To put it into perspective, there are ~4.8 billion internet users around the globe today, 1 in 5 of which use TikTok every single day. The average time spent per day on TikTok is 52 minutes worldwide, and 47.4% of those active users are aged between 10 and 29. It has the highest engagement rate of any social media app, beating out Instagram and Twitter.

Supply and Demand

On average, ~1% of TikTok users create content, 5% directly engage with content (commenting on other content, liking content, etc.) and 95% of users are passive consumers. The average length of the top 100 TikToks is only 15.6 seconds, implying that there are ~200 TikToks that a user engages with per day.

Inside TikTok's Algorithm

From regulatory pressure in the US and export restrictions on its software (in China), TikTok recently released some fascinating details about what goes on "behind the hood" of its algorithm.

What does the algorithm do?

There are 3 goals that TikTok's algorithm is optimized for:

It calculates how similar one TikTok is from another
It predicts a user's affinity or aversion to a future TikTok
It recognizes similarities and patterns across users

While each of these 3 tasks can be broken into several specialized algorithms, the main algorithm that TikTok employs is a collaborative-filtering recommender. The idea is simple. There is a huge database of TikToks, and a huge database of users. Every time a user interacts with a TikTok, the company quantifies how engaged that user was with the content. Based on how the user engaged with the particular content, it will place the user in a cluster or group. Since similar groups tend to like similar pieces of content, it can predict recommendations once it can label a user's interests. However, user's are complex and the algorithm knows that. When there are millions of users and millions of pieces of content, TikTok can figure out highly "niche" user interests and explore those interests along a continuum (over days and weeks) to validate if they are “true” interests.

Source WSJ: A 3D graph of a TikTok user engaging with content. The dots represent individual TikToks. Dots that are closer together are more similar to each other. The lines represent how the user interacts with the content over time.

Over time, the algorithm elucidates users’ interests. The red box above shows how user behavior may start to naturally lead to these types of lines starting to form, representing that a user is consistently engaging with a certain category of content more than others. The sum of these lines can be pretty freakishly accurate proxies for personality and interests. That’s a lot of information on 1 billion people that looks like this:

User A:

likes watching videos of puppies playing with babies
likes watching exercise content, specifically abs this week
likes watching European Soccer League content consistently
likes watching liberal political commentary

If you want to learn more about the algorithm, check this video out.

Implementing the Algorithm

Recently, after reading more about the implementation of the algorithm, I thought it would be cool to apply the same algorithm to different domains. The algorithm extends quite well into fashion, and I’m sure someone will eventually popularize its use. The idea goes like this. Users swipe through images of clothing worn by models. Over time, the algorithm picks up on the user’s preferences and can create curated lists of brands and clothing items the user likes. While the TikTok algorithm is designed to maximize user engagement, this algorithm can maximize the likelihood that the user purchases one of the items in their “liked” list of clothing.

Making a POC

To make a proof of concept implementation of TikTok's collaborative-filtering algorithm, we’ll need:

A large dataset (Kaggle)
Storing the dataset (AWS S3 Bucket Service)
A database to keep track of users and their preferences (PostgreSQL)
A collaborative-filtering algorithm implementation (Python)
A simple backend API to query the algorithm (Python)
A front-end interface that users can interact with (React)

Thanks for reading! If you are interested in working on a POC for this algorithm, let me know! I’m working on it over here, and I’ll keep you posted about the progress - capping it at 20 hours for now.

Best,

Vihar Desu

References

Axios Article

Backlinko Article

Adi

Nov 9, 2021

Great article as always. Spotify uses a very similar algorithm, where once you have the cluster of users created, it will predict the next song you are most likely to listen to based on the previous song you have listened to.

Proof of concept is cool, you should send over the github to as you code this :)

Expand full comment

Vihar’s Newsletter

Discussion about this post