If you look at your Google Analytics reports on a regular basis, you will have probably noticed in recent months a steady rise in referral traffic, usually to your home page, with some peaks of uncommonly high activity. At first glance this looks very positive – maybe the marketing and SEO is finally paying off! But further investigation will reveal that the referrals are all coming from websites that do not actually promote or mention your website at all, and the additional visitors are certainly not real customers. So what’s happening here?
Rise of the bots
The problem is robots, also referred to as web spiders, crawlers, scrapers and search bots. Some robots are good – Googlebot is Google’s search robot that is used to scan your website for new content and help you rank in Google search. Unfortunately, there are many other robots out there that are far less benevolent than Google.
Bots have been around on the web for around two decades, so are nothing new. However, “bad bots” have grown exponentially in the last few years. Earlier this year, Time reported that bots now outnumber people on the internet. According to a study by Distil Networks, in 2014, robots accounted for almost 60% of all web traffic, and almost a quarter of all bots are run for the purposes of criminal acts, including fraud and hacking.
What are bots?
OK, let’s take a step back, and look at what bots are and why they are needed. Bots are programs that run on machines (computer servers) that are designed to find other machines. In the case of Googlebot, the program is designed to follow web links to discover new web pages and documents on the internet.
However, other bots are designed to seek out vulnerabilities in websites to allow hacking, or to automatically complete comment forms or create profiles for forums. Others are designed to gather data on a website to produce SEO reports.
For example, popular web analysis solutions such as Majestic and SEMRush run their own bots which will examine a website and all of its inbound and outbound links to make a prediction of how much organic search traffic it receives.
Hackers and spammers
The most nefarious bots are those used by hackers and spammers. A majority of websites today are built on content management systems, such as WordPress, Drupal and Joomla. The nature of such sites, which use a combination of PHP and MySQL databases to dynamically control content, results in vulnerabilities in the code. Most CMSs are robust, but plugins, themes (web templates) and insecure servers can create security issues.
The moment a new vulnerability is identified, hackers share the information on forums and private groups, and update their bots to “sniff out” websites that have not been patched. When found, the vulnerable websites can be hacked. There are many reasons why websites get hacked, too many to discuss in full here, but one of the most common reasons today is to insert spammy content onto a website to send visitors to alternative websites. Some hacked sites are infected with Trojans and viruses that infect a PC and then steal private information. Google has reported this year a growth in hacking for the purposes of inserting links to improve SEO.
Some bots use a far simpler method to hack a website though – brute force attacks are common. This is when a program repeatedly attempts to log in to a website by using common usernames and passwords. It is a sad fact that “Admin” is still the most common WordPress username – it used to be the default password when WordPress was set up,too. And to make matters worse, it is also the main administrator for the website, so cracking the password means a hacker has full control of a website and can create new user accounts, delete old ones, change passwords and edit any page.
Amazon Cloud bots
Stopping bots is not easy. They are often run from otherwise honest services, such as Amazon’s cloud servers which are commonly used to run advanced bots. Bots need bandwidth and most webhosts will ban accounts that use them, but some organisations turn a blind eye. Read Jack Marshall’s report on CNN to learn more about this problem.
The business problem
The problem for us is that these bad bots mess up our Analytics reports. If the screenshot here looks familiar, you’ll already know how difficult it is to accurately monitor the health of a website when the data is being infected with so many bad referrals from ‘sites’ such as those listed here (with the exception of Facebook, of course!).
We are currently working on a solution to Analytics referral spam that we hope to implement on our client websites soon. If you are experiencing increasing bad bot activity, subscribe to our blog to learn how to fix this problem.