We use many different methods to determine if a user is valid. Some of these methods are algorithmic. Others are learned over time by detecting patterns in the data.
Some methods are conclusive about a user's validity. Other methods produce a likelihood of validity. We compile the information into a unified score with a value between 0 and 1000. The lower the score, the less likely that the user is valid. For example, someone using a proxy server to route traffic through a data center's IP address will receive a low score. If a specific country produces a lot of invalid traffic, their legitimate users will still receive high scores.
Lots of web traffic is generated by bots, spiders, extensions, headless browsers, toolbars, and other nonhumans (collectively referred to as “bots”). Bots have become increasingly sophisticated in the way that they disguise themselves. Therefore, fraud defense systems have to continuously evolve how they find malicious traffic.
Here are some of the indicators that we check when we assess a potential bot.
We check every IP address against our database of known infected machines. This tool detects machines that have been hijacked as spambots, and machines that are infected with viruses and generate lots automated traffic or clicks. This database is maintained in real time to detect emerging sources of fraud and keep us up to date on the latest trends.
Data center origin
We maintain a list of data center-based IP address ranges because many bot networks use data centers to create or proxy traffic. For example, a session from the Amazon Web Services' data center address block is probably not valid because these server rooms aren't typically accessible to human beings.
Public web proxies
Public web proxies are used to hide a user’s location by broadcasting an IP address that appears to come from somewhere other than their real location, like proxying through a data center. We maintain a real-time database of public web proxies to detect proxy-based sessions and score users accordingly.
The Onion Router (Tor) is free-to-use software that allows anonymous online communication. Tor does have legitimate uses. However, because it hides the origin of the user, it's suspicious and can be used to generate random sessions.
Spoofed user agents
Bots often rotate their user agents so they appear as multiple devices to generate realistic-looking traffic. We developed technology to match the user agent to the browser’s capabilities and detect sessions that have altered their user agent.
Bots sometimes create fake referrer headers so they appear to be from a search engine. In many cases, these headers are different from real search engine referrer structures.
This method detects the coincidence of a set of IP addresses and a set of publisher sites.
Other proprietary methods
Hidden users originate from sessions where no page is visible. Whether they are a bot or a human user that never looks at their screen, a hidden session receives a score of 0 because no real person viewed page content.
A user is categorized as a hidden session for the following reasons.
Search engines preload pages in the background while a user enters a search query. The search engine tries to predict which links the user will click, and then opens those pages. This can improve web browsing performance. However, many preloaded pages are never visible and should not be counted as real site activity.
This occurs when a browser window is behind another window and a human user can't see the web content.
Background browser tabs
A browser tab can open in the background and open pages. These pages are never visible unless a user opens the tab.
The session is detected as a bot, not a real person. We track if a session is ever viewed and update visibility based on that. For example, if a page is hidden during a preload, it's recorded as hidden and gets a score of zero. If a user selects the link to view the preloaded page, the session is updated with a new score.