TikTok
Since its initial release in 2016, TikTok has grown to over 900 million daily active users in 2021. To put it into perspective, there are ~4.8 billion internet users around the globe today, 1 in 5 of which use TikTok every single day. The average time spent per day on TikTok is 52 minutes worldwide, and 47.4% of those active users are aged between 10 and 29. It has the highest engagement rate of any social media app, beating out Instagram and Twitter.
Supply and Demand
On average, ~1% of TikTok users create content, 5% directly engage with content (commenting on other content, liking content, etc.) and 95% of users are passive consumers. The average length of the top 100 TikToks is only 15.6 seconds, implying that there are ~200 TikToks that a user engages with per day.
Inside TikTok's Algorithm
From regulatory pressure in the US and export restrictions on its software (in China), TikTok recently released some fascinating details about what goes on "behind the hood" of its algorithm.
What does the algorithm do?
There are 3 goals that TikTok's algorithm is optimized for:
It calculates how similar one TikTok is from another
It predicts a user's affinity or aversion to a future TikTok
It recognizes similarities and patterns across users
While each of these 3 tasks can be broken into several specialized algorithms, the main algorithm that TikTok employs is a collaborative-filtering recommender. The idea is simple. There is a huge database of TikToks, and a huge database of users. Every time a user interacts with a TikTok, the company quantifies how engaged that user was with the content. Based on how the user engaged with the particular content, it will place the user in a cluster or group. Since similar groups tend to like similar pieces of content, it can predict recommendations once it can label a user's interests. However, user's are complex and the algorithm knows that. When there are millions of users and millions of pieces of content, TikTok can figure out highly "niche" user interests and explore those interests along a continuum (over days and weeks) to validate if they are “true” interests.
Over time, the algorithm elucidates users’ interests. The red box above shows how user behavior may start to naturally lead to these types of lines starting to form, representing that a user is consistently engaging with a certain category of content more than others. The sum of these lines can be pretty freakishly accurate proxies for personality and interests. That’s a lot of information on 1 billion people that looks like this:
User A:
likes watching videos of puppies playing with babies
likes watching exercise content, specifically abs this week
likes watching European Soccer League content consistently
likes watching liberal political commentary
If you want to learn more about the algorithm, check this video out.
Implementing the Algorithm
Recently, after reading more about the implementation of the algorithm, I thought it would be cool to apply the same algorithm to different domains. The algorithm extends quite well into fashion, and I’m sure someone will eventually popularize its use. The idea goes like this. Users swipe through images of clothing worn by models. Over time, the algorithm picks up on the user’s preferences and can create curated lists of brands and clothing items the user likes. While the TikTok algorithm is designed to maximize user engagement, this algorithm can maximize the likelihood that the user purchases one of the items in their “liked” list of clothing.
Making a POC
To make a proof of concept implementation of TikTok's collaborative-filtering algorithm, we’ll need:
A large dataset (Kaggle)
Storing the dataset (AWS S3 Bucket Service)
A database to keep track of users and their preferences (PostgreSQL)
A collaborative-filtering algorithm implementation (Python)
A simple backend API to query the algorithm (Python)
A front-end interface that users can interact with (React)
Thanks for reading! If you are interested in working on a POC for this algorithm, let me know! I’m working on it over here, and I’ll keep you posted about the progress - capping it at 20 hours for now.
Best,
Vihar Desu
Great article as always. Spotify uses a very similar algorithm, where once you have the cluster of users created, it will predict the next song you are most likely to listen to based on the previous song you have listened to.
Proof of concept is cool, you should send over the github to as you code this :)