Related calculations or result may change in the future, the experiments were done during march 2019.
There is no clear answer to this, but I wanted to find some kind of indicator, and remove the speculation as many people do speculate and try to get a more accurate information about this by myself.
For this I wanted to use either machine learning or simple mathematics and since I like to solve silly problems as "viewer fraud detection" and I been playing with machine learning a lot and felt this was an overkill, I decided to take the a simpler path and just use math, standard distribution to detect it (not machine learning). :)
Problem Analyses
So I went on my own and did some analysis, and realize an interesting pattern on viewerbots, as of march of 2019 (this might vary in the future), the companies that creates the viewer bots, do not use the same bots as followers and viewers, the viewer bots are not the same "guys" that follow, the followers are taken randomly from a huge pool of fake users. So they viewbots normally don't follow the channel and the ones that are used as "followers" are not in the chat, I realized this after several channels with this problem when they were noticeable being viewbotted, I also created a dummy account to be viewbotted and confirmed the pattern. With this assumption I started to use simple mathematics.I tried to find out what is the ratio in the chat of "Followers/non-followers", but for this I needed to poll twitch chat and a few hundreds of channels, so I created a small software that did this for me, a small crawler that monitored chats for a week, from random different streams in different categories.
Followers/non-followers ratio:
Lets call it "FnF" for short, from now on, is what is the percentage of the people that shows in the chat list that are followers of the channel. So for instance:
A streamer that has 100 people in the chat ( excluding known bots) and from those 100 users, 75 follow the channel, would have an FnF of 75%. A ratio of follows non followers of 75%
At this point we don't know what is the acceptable ratio, so don't take assumptions, this is only a demonstration of what we want to find out.
Example of how a chat with an irregular composition of users in chat. |
What was my intention? You may know the probability distribution as it is the most popular technique to detect deviations of behaviors in numbers, is the bell curved graph that you saw in your probabilities class back in engineering school. In a normal distribution 95% of the values fall in to two standard deviations of the mean.
Standard normal distribution |
So knowing this, we want to find how many channels we need to poll for a fair amount of confidence, and based on techcrunch article with the title "Twitch now has 27K+ Partners and 150K+ Affiliates making money ..." they mention that twitch has about 2.2 million unique broadcasters monthly (about the beginning of 2019, this information may change in the future). So let's imagine that is the population, 2.2 million streamers. For a confidence of 99% and a margin of error of 5% we would need a sample size of 665 streamers.
I also had to consider a few things:
- There are known bots (bananen, comanderroot, nightbot, etc), that can inflate the stats on small streamers, in some cases I saw the "known bots" could represent up to a 25% of the users in chat, so I had to also filter out this bots that were everywhere.
- I can not poll channels that are too big ( +5,000 viewers) since I did not want my crawler to get blocked from twitch or from any API. So I decided to only collect information of streamers with more than 15 viewers and less than 5,000 viewers. So my crawler does not stress out the servers or get blocked. This means our stats have a bit more confidence than 5%, but it should only be applicable to channels with a range of 15 to 5,000 viewers.
- This theory could fail if the streamer was recently raided by a bigger streamer, so most of the viewers would not be followers, for this we took multiple "snapshots" of the chat in different moments, and picked the one with better FnF ratio. By snapshots we mean, status in a current moment or time, then tried another status 30 minutes later, then another one 30 minutes later and so on, and we picked the one that had the best ratio of "chatters/followers".
Sample collection and selection
To have a better and cleaner samples I followed the next steps:
- Pick about 1000 streamers, with more than 15 viewers and less than 5000. The amount of streamers to get was decided based on:
- Detect when they went online, take multiple snapshot starting 1 hour after they went online with a separation of 30 minutes each snapshot.
- Take the snapshot that had the best FnF ratio in the chat ratio an only streamers that the software managed to get more than 10 snapshots (about 6 hours of streaming). This reduced the channels from 1000 to 713, but still was more than we needed for the 95% confidence.
Data Analyses
Fig 3. List of known bots |
During the first days running we found the users that are in almost all the channels, so we classified them as "kown bots", among them: commanderrooot, bananenanen, p0sitivitybot, etc. There is a chance this are actual viewers that are everywhere.
Known Bots
I did not know how to define a "known bot" so I added the conditions for the algorithm to categorize a known bot, as a viewer that was on more than 300 snapshots. That means, a viewer that has been seen watching more than 30 channels.
Picture at the right shows a sample list of the "known bots".
After filtering the bots, we took multiple snapshots and the application then started to group them by the percentage to generate the normal distribution bell, and also do additional calculations.
After a few hundreds of channels analysed you can start to see some outstanding channels that have a ratio FnF amazingly high (close to 100%). We also got our first average ratio of the average of chatters that are followers vs non-followers.
The next figure shows the distribution of the channels chat FnF ratio, with some channels having always some followers and a few of them were even all the 100% followers.
The software also showed the next results:
Average: 75.210
Standard Deviation: 16.77
also you could start to see how the "bell shaped" chart was taking shape, but this was not even half way the data analysis. We still had more than 700 channels pending of analysis.
The following screenshot shows one of the streamers that had an excellent ratio with a good amount of viewers/followers ratio most of the times.
After manually visiting the channel there was something definitely wrong, where only two chatters would talk, so you could tell something was odd, and all the snapshots were very similar, the one being shown was the best scored.
In one of the occasions, for a different streamer, the snapshot was taken right after another streamer raid, so she went from 35 viewers to 150 viewers, so she fell in to the bad stats, that was the reason I tried to take multiple snapshots during different time ranges so there was not corrupted information in a small sample.
After 90% of the channels have been analyzed this is how it looks owr "bell" shaped chart:
Standard normal distribution chart of the samples collected. Where most of the streamers have between 70% and 95% of FnF ratio. |
After about 95% of the channels being calculated, we realized the curve looks like this:
And we know that:
- The maximum amount of FnF ratio is 100%.
- 93% of the channels have a FnF ratio bigger than 50% ( people in chat that are followers).
- Only 7% of streamers will have less than 50% of FnF ratio.
I started to study several individuals that have strange numbers during multiple times, here is an individual with a good number of followers and approximately 25 bots.
He constantly had a ratio of 31% to 35% ratio (for a a twitch partner the numbers are actually very low), I decided to give him multiple opportunities, and took additional snapshots of him, but it always came back as a number between 30 and 35%. The probability of someone falling under 35% is 0.08% ( less than 1 percent. This offered a good amount of confidence that the streamer is being viewbotted (not a direct accusation but something is odd with the channel).
Here is another individual that, you can definelly tell:
After about 20 snapshots were taken ( more than 10 hours in different days) but the streamer never went over 5%.
Conclusions
- Viewbotter streamers have a very irregular ratio of followers in chat, but this does not mean they are viewbotters, but definitely place them in to an irregular spot. Just dont use this to make an accusation, it is better to take a look at the stream multiple times in different days.
- Streamers that recently started will have a very good ratio, since the first people to arrive are followers, then the ratio will decrease, since other viewers will arrive during the hours.
- Streamers that were raided by others streamers are very likely to have a very low FnF ratio, so multiple observations are required.
- Creating a machine learning algorithm for fast detection would be very simple based on all this sample data and I may create one in the future weeks.
- I also observed that teams on twitch are being exploited as an artificial way of inflating stats, people that are in the same team, watch each other, a person will leave a browser tab open for multiple teammates, it sounds simple, but when you have a team with about 100 members, watching each other, they start to move the needle significantly. I found this pattern in many "streaming teams".
As for now, in march 2019, you can tell if a channel is in an irregular spot by taking a look at the chat, count how many of them are following and how many of them are not, if the FnF ratio is less than 30% in a period of multiple streams and multiple hours, it is definitely something irregular, but if the ratio is more than 75% definitely that streamer is in a very good spot.