The challenges of using search engines for adverse media screening (AMS)
“He killed it out on the basketball court!”
Conscientious consumers, investors and employees, plus globalised supply chains and enhanced regulations have made it vital to know who you are in business with. Equipping yourself with the due diligence right tools has never been more important.
Using search engines to screen adverse media is time-consuming, inefficient and inconsistent. Relying on these methods is a drain on analyst time and leaves your organisation vulnerable to missing risks.
The Wolfsberg Group’s recent FAQs on negative news screening emphasised that, though it’s a crucial part of anti-money laundering (AML) compliance, it’s often seen as a drain, requiring “excessive manual resources” and providing little value.
The traditional method of using a search engine such as Google for ‘string-searches’ (searching your person’s name with multiple risk words) often generates millions of results. These pages of google results are a black hole into which the time and resources of compliance teams disappear. Analysts have to go through every result, and figure out which are both:
Technology has helped, but by flagging risk keywords, traditional solutions only tackle part of the problem, and again risk missing something crucial.
The issue at stake is clear: not only are Financial Institutions wasting analyst time and resources on manual AMS, there’s also the risk that an analyst can miss something crucial about your client. A mention of laundering could be buried on page 78 on Google. Missing this would mean financial penalties, reputational damage, and falling out of favour with the regulator.
However, throwing more analyst time and effort at the problem is not the solution.
New technology break throughs in Artificial Intelligence and Natural Language Processing means that AMS can be done in a way which adds value without burdening manual resources. With Xapien’s developments in Natural Language Processing, we can go beyond keyword searches to actually read all of the open-source data about your person. Rather than having to trawl through dozens of articles, an analyst gets an easily-digestible and comprehensive report, and can make an informed decision backed by a full audit trail.
In this guide, we’ll take a look at the challenges of using search engines for AMS. We’ll show how Xapien’s approach automates this process using human-like techniques, powered by AI and Natural Language Processing. And we’ll show how you can benefit from using AI for faster and more informed client onboarding.
- The challenges of using search engines for AMS
- The tech solutions
- How an automated system can benefit you
What are the problems with using search engines for AMS?
If this is how you typically do AMS, you will be aware of the following challenges:
- Being overwhelmed with information
- The risk of missed risks
- Inconsistent results and processes
It is impossible to control what Google shows you in the results of a search. You can’t tell Google that the only ‘Chris Smith’ you’re interested in lives in Denver and works as a lawyer. This is partly what causes these challenges, but let’s take a look at these problems in some more detail, and how to solve them.
Problem 1: Information overload
Running “Chris Smith” with just 10 risk words generated 2 million results, in less than a second. No analyst can go through all these results.
False positives (articles about someone else with the same name) cloud most of these results, which buries the relevant articles.
Verifying whether the “Chris Smith” mentioned is your person within these results can take hours, even days, of work. Analysts need to verify every piece of information using dates of birth, locations, middle names, images, and other bits of context. As you are researching your ‘Chris Smith’, however, you will find new information about your person, such as an unexpected previous job role in another country. You might then have to go back, as an article you had previously discounted turns out to be about your “Chris Smith”.
There are also false risks, like the sentence we’ve quoted in our title. If you run a string search with the risk words “court” and “kill”, then an article with the sentence “He killed it out there on the basketball court!” will be one of the top results. But it isn’t risky. There are some particularly frustrating cases of this. Perhaps the person you are searching for shares a name with a famous tennis player (see “court” flagged 10,000 times). Or, “Chris Smith” is a lawyer, and has worked on cases prosecuting criminals, tax evaders and money launderers (see: every risk flag imaginable). This is the costly problem of false positive risks.
Problem 2: The risk of missing risks
As there is too much information, analysts will typically have a cut-off point – perhaps page 20 of Google results. The hope is that the most relevant and important results will be prioritised. However, as users we have no control over what the search engine prioritises. If Google has downgraded the key articles about bribery to page 21, this leaves the organisation at risk of future reputational damage and financial penalties.
Our Chris Smith is a lawyer, so there is likely to be a lot of results mentioning ‘charged’, ‘prosecuted’ and ‘crime’. These won’t be risks that apply directly to your subject. There could be one or two articles however that accuse your subject of ‘corruption’ and accepting ‘bribes’. Google is likely to push this article down to the later pages and show only the risk words that appear more often and yet aren’t relevant to your investigation.
You also risk missing risks in other languages, unless you:
- Detect which languages are most relevant for your person.
- Run those searches in the appropriate languages and using the most appropriate risk words, sometimes the stem of a word (i.e. using “tax eva” to pick up both “tax evasion” and “tax evader”).
- Translate all the results into a language familiar to you – or enlist a colleague familiar in the language to help – to assess the information.
Problem 3: The risks of inconsistency
It is often said that no two analysts go through the same process. Even if there are guidelines in place (like stopping at page 20), analysts will all read, search and analyse information differently. This means that the KYC process won’t be consistent and standardised.
Inconsistencies, as well as human error, can often lead to risks going under the radar.
Even worse, search engines show different results to different users!
This is called the ‘filter bubble’: when search engines use algorithms to assume what information a user would want to see, and then tailor the user’s results accordingly. The term was coined by Eli Paiser. He mentions a case where one user who searches for ‘BP’ got investment news in the first few pages of results, while the second user got news about the Deepwater Horizon Oil Spill.
Incognito mode is sometimes used as a workaround to this, but this has been found to be an unreliable way to get around the ‘filter bubble’. Without consistency around search engine results and processes, your organisation is vulnerable to missing important risks.
What about the technology that’s out there?
Luckily, there are solutions out there that help with Adverse Media Screening. These solutions do save some time, but traditionally they leave much to be desired.
For example, some AMS systems scan for risk key words within, say, 20 words of a reference to your person’s name. Some of the problems that an analyst using these solutions runs into are similar to the challenges of using Google, as this method doesn’t take into account:
- The directionality of risk (your person might not be the guilty party)
- Negations (e.g. if your person worked in “counter-fraud”, or “fighting financial crime”)
- Context (“he killed it on the basketball court”)
So this tech causes much of the same problems as using a search engine: analyst time is wasted going through the articles and verifying which ones are actually ‘risky’. This delays client onboarding and risks either slowing down the normal course of business, or skipping full verification, which can lead to major compliance oversights.
Moreover, these systems typically rely on databases of sources, but there are things online – in blogs, forums, places on the web we wouldn’t always think to look – where some rumour might be what you need to know. Using databases of particular media outlets risks missing what’s out there on the wider web.
How does Xapien fix these problems?
Solution 1: Breadth of information = no missed links
Xapien’s approach is different. Rather than replacing the painstaking manual searching process with database searches or keyword scanners, we have trained Xapien’s automated research platform to replicate the manual process. Just at an incomparable scale and speed.
Our system runs searches on thousands of risk words, on various search engines, and then scours the depth and breadth of the internet. The system doesn’t just scan the texts it finds for risk words, it actually ‘reads’ them like a human, extracting any potential risk. It finds and translates information from over 130 languages, before analysing them to find risk.
Xapien reports combine open-source media screening with corporate records and PEPs, sanctions and watchlist screening. This report gives you the full picture of your person, in just a few minutes.
Solution 2: Accuracy = only the information you want to see
All you need to run a Xapien search on your person is their name and the name of an organisation or person they have been associated with. The system then uses this as context to find out more about the person. Behind the scenes, as the system is searching and scouring everything it finds on the web, it creates a large knowledge graph where all the ‘nodes’ of information are like stars in a large constellation. That means that all the information in the report has been cross-referenced and verified against all the other information the system has found about your person. This means all the information is about your person, not someone else with the same name.
Our system also uses Natural Language Processing, to go beyond keyword searching and take into account grammar, context and negations. We use Machine Learning, so our system is trained to understand the difference between the meanings of “court” in the context of the sentences “she was summoned to court on Saturday” and “she played well on the tennis court”. It has also been trained to understand different colloquial and slang meanings of words, so that “kill” in the phrase “he killed it out on the basketball court” does not mean “kill” in a traditional, murdery sense.
The system has also been trained to detect the grammatical subject of a risk word. So, in the sentence “Chris Smith accused Julie Davies of corruption”, we can tell that Julie is the risky person in this context, not Chris Smith, who is levelling the accusation.
This means our system can tell whether or not a risk is actually tied to your person, and if it’s actually risky. So, you don’t need to go through hundreds of irrelevant false risks.
A detailed, sourced report with just the relevant information, generated in minutes, saves compliance team hours, even days, of work. Organisations can onboard more clients, faster.
Solution 3: Standardisation = consistent, trustworthy results
As an automated system, Xapien bypasses search engine algorithms and cookies, which means no filter bubble. It also means that searches can’t be traced back to your organisation. A standardised, efficient system that replicates the manual process, but at a previously unreachable speed and scale.
This standardised process also means:
- Full GDPR compliance
- Easily digestible reports, which can be shared as a URL or PDF
- Full audit trail to back up any decision
The outcome is a process you can trust. Organisations can make confident decisions backed by a full audit trail.
AI insights, straight to your inbox
Search engines are great but they are only the starting point. Finding, reading and condensing the full picture is slow, hard, and painstaking work. Xapien can help.