Protecting websites from bots is critical nowadays. Especially interactive websites with features like user registration and online forms are often targeted by automated software. Detecting and reducing bot traffic is important to keep a website operational and accessible to real users.
What is a bot?
A bot is a piece of automated software that is built to complete a series of tasks over and over without human involvement. They can be used to sign up for thousands of user accounts, collect information from the website, post spam, or just increase the website’s traffic to a point where it can’t serve all requests anymore. While a single bot performing many tasks in a short time frame can be easy to detect, it’s common for bots to spread traffic across many devices and networks to reduce the risk of getting detected.
Since bots are getting more and more advanced it’s important to stay one step ahead.
Rate limiting is the most basic mechanism to protect a website from bots. Based on a unique identifier for each user the website limits the traffic that is allowed to come from that user. For example, a single user usually doesn’t make more than a few requests per second at burst and stays well below that threshold in most situations. It’s common to keep rate limits as strict as possible while not degrading the user experience.
The difficult part about rate limiting is choosing or generating the identifier for the user. Using the user’s IP address is a good start and may work for some applications but is not the best approach in the long run. Multiple users often share the same IP address, especially mobile users can share the same public IPv4 address with hundreds or even thousands of other users. Bots can also easily hide their real IP address by using proxies. To circumvent this limitation, rate limiting is commonly used in combination with fingerprinting.
Fingerprinting is used to generate a unique identifier for a user or device which is usually not dependent on the IP address. These identifiers can be used by websites to track the behavior of users, detect suspicious users or rate limit the user.
Generating a good identifier that recognizes the same user across IP addresses and browser sessions isn’t a trivial task.
Information that is commonly used to fingerprint a web user are the browser type, version, and settings, operating system, browser extensions, time zone, language, screen resolution, hardware, and other such features. Each of these pieces of data aren’t very useful on their own, but by combining them together a website can generate a fingerprint with a relatively low chance of users sharing the same fingerprint (which is called a collision). It’s rare for users to have the exact same combination of these parameters. Nevertheless, more advanced attackers can manipulate their fingerprints to impersonate any identity. Therefore, fingerprinting cannot be the sole solution to protect against bots.
There are a lot of bots that don’t target a specific website but instead automatically follow links to find websites where they can inject content and fill out forms. Honey Pots are an option to protect your website from such attacks.
The easiest way to create a honey pot is by putting an extra field into your form which is hidden from normal users using CSS. Some bots won’t notice that the field isn’t visible and will fill it out anyway. The website can then check if the field has been filled and filter out suspicious entries. While this method can provide protection against bots for a certain period of time, after a while even naive bots can recognize honeypot fields and once detected can bypass them without any further hurdle. Also users using screen-reader software (used by blind or partially blind users) may accidentally fill this field to, making it impossible for them to use your website.
Image Recognition Tasks
Image challenges are a good way to distinguish between real users and bots. They require manual interaction by the user and aim to be hard to solve by bots. Such a puzzle could be repeating a set of distorted characters or selecting images that match a description.
While these types of tasks are usually hard to solve for bots, there are many services that offer automated solving for very little money. These services are usually powered by artificial intelligence or cheap labor. In addition, image and audio recognition tasks are not accessible to all users and hurt the user experience which can cause increased abandonment. This presents website operators with a particular challenge if they want to ensure that their website is accessible.
Managed Bot Protection Solutions
Integrating all these anti-bot measures can be a lot of work for an individual website. That’s why there are managed web security solutions that aim to protect websites against bots while being relatively easy to implement. One of these solutions is Friendly Captcha.
Friendly Captcha is the only sophisticated proof of work based solution on the market. Friendly Captcha uses a combination of advanced cryptography and fingerprinting with full privacy protection to defend websites and forms from attacks. Each user gets a unique cryptographic puzzle which can be solved by the user’s browser in the background. By combining an anonymized version of the users’ fingerprint and additional checking parameters, Friendly Captcha can detect when a user makes suspicious requests and scale the difficulty of its cryptographic puzzles. In comparison to other managed bot protection solutions, Friendly Captcha is accessible to everyone, doesn’t require manual interaction by the user, is fully privacy compliant and slows down suspicious traffic.