Introduction
Every day, thousands of streamers boot up their Twitch streams and upload Terabytes of content to the Twitch platform. Can we develop an algorithm to automatically find clip-worthy moments from streams, clip them, add subtitles, and automatically upload them to YouTube and TikTok?
Methodology
Let’s begin with the Twitch API. After requesting the Twitch API key and obtaining access credentials, we are ready to start sending requests! The Twitch API has a plethora of endpoints that can be accessed, but we will focus on streams. Let us access the top 5 live, most popular English streams to start.
Image: Twitch Stream Data
Now that we have access to top Twitch user_ids, we can retrieve their VODs (recent broadcasts). We will download these VODs and their corresponding chats with the Twitch-dl and Chat-dl Python packages, and automate this process using the subprocess package. VOD sizes vary in length, but the average VOD download is ~13GB (quite big)! I am currently doing this process entirely locally but will look to leverage AWS EC2 and S3 in the future.
Image: Example Twitch VODs
Now that we have Twitch VODs and their corresponding chats downloaded, let’s clip the funny moments! How do we automate this process without ever actually looking at the video? The answer is Twitch Chat! As it turns out, both chat frequency and message content are good indicators of funny/viral stream moments.
Image: Twitch Chat Data
Language Model
How do we know what to look for in Twitch Chat? We will procure a sample data set of viral clips with their corresponding chats, and analyze the most frequently typed messages. We procured this sample from the top 20 streamers’ viral clips. Below you can see the most frequent words in the corpus. Notice they are all emotes!
One aspect of our language model that we want to maintain is the notion that not all messages are created equal. We can see that the most common message in the viral clips data set is “KEKW” (laughing emote), so this message carries more signal in determining when a viral clip is occurring. Therefore, in our scoring algorithm, we will weigh messages with higher frequency more heavily. Note that this is not normalized against the overall Twitch chat population (“KEKW” can simply be the most typed message in non-viral clips as well), but we justify that the volume of these types of messages in rapid succession will lead to a higher score and a correct indication of where the most viral clips are. In essence, we created a simple BoW (bag of words) model.
Clip worthiness score = ∑(message * frequency value)
Note: Scores are calculated in 30 second windows and spam messages by same user are filtered out
After calculating clip worthiness scores in every 30 second window, we take the top 0.05% percentile scores and their corresponding time windows and generate clips out of them! Here is an example clip:
This is a good start, but we still have more work to do to get these videos TikTok ready. Let’s autogenerate subtitles for these clips using Python. By leveraging AssemblyAI’s transcriber object with our custom config settings, we obtain the .srt (subtitle) file associated with the clip. We then add this file to the clip using FFMPEG (multi-purpose package for video manipulation). Here’s an example of a subtitled Twitch clip (using Crunchyroll-style font):
It is common to overlay the video with another video to maximize user retention. These videos can be Minecraft parkour, YouTube compilations of satisfying things, or cars from the video game GTA rolling down a hill. We will go with GTA cars for now. Let’s see an example clip:
AutoUploading Clips
I utilized Selenium to go to TikTok landing page —> TikTok login —> TikTok upload —> choose file —> edit caption and visibility —> post, All without a click of the mouse!
Video: Autouploading to TikTok using Selenium
Conclusion
Throughout this post, I showcased an end-to-end pipeline for automatically generating Twitch clips. By leveraging Twitch’s API, a BoW language model, and subtitle implementation, we are able to systematically generate potentially viral Twitch clips that can be posted to TikTok and YouTube! The article’s scope is limited to popular, live, English streams, but with some additional work this can be scaled to include different languages and medium-sized streamers as well. Future work includes researching how to maximize user retention and implementing a more dynamic and stable process for finding the best stream moments. Feel free to reach out and let me know if you have any questions!
Automated Twitch Clip Generator
Creating bite-sized laughs one stream at a time
By: Peter Larcheveque