Explainable AI:

Xapien: How it works

When it comes to due diligence, you need comprehensive research with maximum efficiency. But it’s easier said than done.

Whether you contract a third-party firm, or your team manually searches, analyses, and writes up a report, it’s expensive and time-consuming. This can lead to missed opportunities, and for your fundraising or business development team, it can be deeply frustrating.

The role of a research analyst has become more research than analysis. The ever-growing quantity of sources and datasets is all-consuming. Meanwhile, a mounting pile of due diligence requests grows. It’s easy to get lost in a maze of information, unsure if it’s relevant to your subject or how to step out of the ‘rabbit hole’ to form judgments and strategies.

Even low-risk, low-profile subjects can require a day’s worth of research—looking under all the rocks can be more time-consuming than dealing with larger, riskier profiles. Put simply, searching through the entire indexed internet and compliance data spread across various platforms and web pages has become if not impossible, then impractical, for most businesses.

But what if this could be automated? A comprehensive report on anyone, anywhere in the world, in minutes.

Meet Xapien.

What does Xapien do?

Xapien is an AI tool that handles manual research, analysis and report writing, freeing up teams’ time to focus on serving clients better. It automates in-depth research, covering the entire process from initial search to summarising insights in a final report.

Our AI scours millions of registries and screening data, as well as trillions of web pages across the entire indexed internet. It extracts and contextualises fragments of information about your subject, whether it’s an individual or a company.

Finally, these findings are compiled into a fully-sourced, summarised report that colleagues, committees, regulators, and third parties can easily share. To get started, all it needs is a name and some context. This frees up due diligence teams and analysts to focus on protecting and serving their clients.

How does Xapien work?

Xapien begins by searching across licensed data sets, official registries, and the entire indexed internet to comprehensively gather all available data about a subject. This data may be in various formats, such as images, web pages, news articles, and corporate filings.

Machines generally don’t do well with all these diverse formats, so we pass pages of information through our Natural Language Processing (NLP) engine to read them and extract valuable text. This leaves Xapien with a clean ‘document’ for each web page, news article, or similar source.

It then breaks down each document into individual sentences and smaller pieces of information, capturing all potential knowledge. Fragments are then encoded into vectors.

Vectors 101

Vectors are a way of encoding information using a set of numbers since machines can only understand numbers and not words. To illustrate this, consider eye colour. We can categorise individuals into teams based on their eye colour. For instance, blue eyes belong to Team One, green eyes to Team Two, and brown eyes to Team Three.

This is a one-dimensional vector. Now, let’s say a person has brown eyes but also has blonde hair, or brown eyes and brown hair. This represents a two-dimensional vector.

Xapien generates vectors with thousands of dimensions to depict names, organisations and insights about a subject. It uses not only the semantic meaning of the words from the text, but the linkages to other things that set those people, organisations or events in context.

Using advanced machine learning techniques, Xapien can then bring these vectors together based on their underlying similarities as part of a wider process to resolve people, companies and their mentions across different sources.

Xapien generates a vast vector database as part of each and every enquiry, as well as extracting detailed linkages between people, organisations, locations, events and thematics (a vector-augmented ‘knowledge graph’ if you prefer to get ‘technical’).

It then uses this enormous knowledge graph to generate fully-sourced, traceable insights about the subject using state-of-the-art generative techniques.

Xapien’s summarisation

The team at Xapien have developed machine learning techniques to extract the relevant insights that clients need across every fragment of information collected. For example, knowing a subject’s Source of Wealth is essential for Enhanced Due Diligence.

To generate a summarised research report, Xapien runs hundreds of queries against its knowledge graph, asking all the questions users need the answers to, usually asking the question in lots of different ways.

In this scenario, the vectors and linkages provide a set of relevant insights related to a subject’s Source of Wealth. We then use a Large Language Model (LLM) as part of a wider ‘generative’ layer to summarise those insights into a concise paragraph, resembling a human-like written report. It repeats this process for each section in a Xapien Insights report.

What are Large Language Models (LLMs)?

Put simply, LLMs are advanced computer systems that have been taught to understand and generate human-like language. They can read, write, and understand text just like humans do.

But how does Xapien use LLMs to generate concise summaries?

LLMs are trained on vast datasets, subsuming large amounts of real-word knowledge, and have been trained to understand and learn common word sequences. Because of this, LLMs can generate human-like language by predicting the next word in a sentence.

For instance, when you input ‘the sky is…’, the model would predict ‘blue’ as the next word. It’s the most common phrase. But if you request a more creative response, it might generate ‘the limit.’ These predictions are based on statistical probabilities of word sequences.

A great deal of care needs to be used when working with LLMs and generative AI to ensure that the information is not just ‘nice to read’ or ‘impressive’, but is factually accurate, consistent and traceable.

How can I verify the source of information?

Every piece of information provided in Xapien’s rich summaries is fully traceable, down to a sentence and sometimes word or phrase level, showing the exact part inside specific news articles or webpages from which Xapien collected the original information.

It’s essential that we provide fully attributable and traceable reports. When conducting research and due diligence, you can’t make decisions off facts or statements without being able to judge the credibility or reliability of the underlying source.

Xapien traces the lineage of insights and reveals where specific sentences within the report come from. This offers a fundamentally different way of receiving insights from generative AI—a transparent, intuitive, and enterprise-ready process.

How does Xapien combat hallucinations?

The world was initially captivated by the launch of ChatGPT and other generative AI tools. But as people encountered issues, some of that excitement began to fade.

Instances of false information and “hallucinations” generated by the technology raised concerns. The question then became: how do you use LLMs effectively without the risk of generating false information?

Xapien’s approach involves breaking down the process into atomic components. We don’t just ask an LLM to answer a question based on the data it’s been trained on. Xapien’s new generative AI system builds on over five years’ worth of tried and tested research, Natural Language Processing (NLP), and our proprietary disambiguation technology.

To illustrate the challenge, you can’t gather 500 documents about someone named Chris Smith and summarise financial crime insights from that. You’d have to determine whether this is the specific Chris Smith you’re interested in.

This is a fundamental, and incredibly difficult step. But it’s a technology we have been perfecting that sits right at the heart of Xapien.

It’s easy to think that when you type a famous person’s name into Google, you’re going to get content about them. But even famous people share their names. For example, the actor David Schwimmer has the same name as the CEO of the London Stock Exchange Group.

Working up at the generative AI level, we’ve developed proprietary technologies to counter the hallucinogenic tendencies innate within generative AI. We’ve built a system of algorithmic safeguards using non-generative, non-machine-learning technology with roots in deep NLP, which acts as a protective layer around our interaction with LLMs.

This enables us to harness their power but with unique control over the quality and accuracy of the output, ensuring that everything we surface has been independently checked and verified by a totally different algorithm. This means that insofar as can reasonably be asserted, you’re always working with a distilled and accurate representation of the information that Xapien has sourced.

Want to keep learning about our technology? Check out our other Explainable AI blogs.

Where would Xapien fit into my due diligence process?

Short answer: right at the start.

Xapien users run their clients, prospects, donors, suppliers, investors, and other third parties through the tool for “initial due diligence.” This enables them to gain early insights into potential risks and opportunities. Instead of spending time gathering research, their teams can focus on minimising those risks and maximising opportunities

Case study: KaurMaxwell

Xapien’s enhanced search capabilities provide the firm with a KYC process that leaves no stone unturned. With a clear starting point for research, what previously required five analysts for this depth of research now only needs one. In under 10 minutes, Xapien uncovers information that previously would have taken days or weeks to find. The result is a fully-sourced summarised report with an upfront view of a potential client’s risk profile.

Case study: University of Liverpool

The University of Liverpool’s operations team built their due diligence process into prospect research from the start using Xapien. Instead of spending hours on manual research, Xapien summarises all relevant information about a prospect in minutes. This means due diligence can be done upfront without holding up the prospect research process. By combining the two early on, the fundraising team can focus their efforts on which prospects to approach for donations.

Case study: Griffin

As soon as a prospect or supplier is mentioned, banking as a service (BaaS) provider, Griffin, runs their name through Xapien. Inputting the search terms takes 30 seconds and compliance can leave the software to run while they get on with other tasks. The results are neatly summarised and only take half an hour to review. Xapien effectively flags direct risks, enabling the team to streamline their focus and skip irrelevant information.

Case study: Private equity firm

The compliance team have a financial crime checklist for their deal screening process which Xapien is a fundamental part of. They run Xapien reports on the sellers, the target, and the directors as soon as they have the information. This proactive approach allows them to catch risks early in the process and prevent both teams from investing excessive resources in a deal that might not proceed.

Case study: Sightsavers

To keep up with due diligence requests from fundraisers, Sightsaver’s prospect researcher runs a Xapien report on every prospect. It takes just minutes to review the report and identify any ethical red flags, along with legal, financial, and reputational risks which might require further investigation. These decisions are then recorded on an in-house due diligence template, categorised as either ‘low risk’ and approved, or escalated for resolution at an appropriate level. Notably, Xapien saves them 3 to 4 weeks of desk-based research.

Discover how more organisations are using Xapien here.

Chat with us to learn more

Monthly learnings and insights to your inbox

Xapien streamlines  due diligence

Xapien's AI-powered research and due diligence tool goes faster than manual research and beyond traditional database checks. Fill in the form to the right to book in a 30 minute live demonstration.