Our CTO on how Xapien is different from ChatGPT

Explainable AI:

Our CTO on how Xapien is different from ChatGPT

Our CTO on how Xapien is different from ChatGPT

Shaun O’Mahony, Chief Technology Officer • May 16 2024

A common question we receive is what sets Xapien apart from generative AI tools like GPT or Gemini. Shaun O’Mahony, Xapien’s Chief Technology Officer, explains why Xapien is designed for due diligence.

TL;DR

When it comes to due diligence, Xapien differentiates itself from generative AI like GPT through accurate subject resolution, anti-hallucination technology, and integration of critical AML and compliance data. Unlike the evolving requirements of generative AI tools, it ensures reliable, unbiased, and comprehensive due diligence reports from a simple search.

Let’s start by defining generative AI

Generative AI refers to a category of artificial intelligence that falls under the wider umbrella of machine learning, where AI systems autonomously learn from data patterns, without explicit programming for each task. The hallmark of generative AI is its ability to produce content, be it text or images. This is a significant advancement from previous AI capabilities focused on recognising patterns.

Large language models (LLMs) are a prime example of generative AI at work. They fit into this category by being advanced implementations of the generative AI principle, specifically designed to handle and generate text-based content. LLMs, such as ChatGPT by OpenAI, Google’s Gemini, and Anthropic’s Claude, are built on vast datasets of text from which they learn language patterns, context, and nuances.

These models use their training to predict the next word or phrase in a given sequence, making it possible to generate detailed and relevant text based on the prompts they receive. This predictive capability marks a leap from earlier AI technologies, which primarily focused on pattern recognition without the ability to produce new content. LLMs embody the generative aspect of AI by understanding and processing existing information and creating new, coherent, and contextually appropriate responses, texts, or even complex documents, in response to naturally written requests.

Does Xapien use generative AI?

Yes, but it’s just one element of the platform. Xapien is made up of a suite of around 20 different AI models. Generative AI plays a role in summarising and synthesising information. However, it’s the orchestration with other AI models that enables it to automate the complex human research process from an initial, singular search query to a full written report.  

Each model brings a specific strength. Together, they are how Xapien can automate complex analytical tasks, parse vast amounts of data, and highlight critical insights that might otherwise go unnoticed by a human or a generative AI tool. 

Only after a meticulous organisation and analysis process does Xapien present the results to the user. This presentation phase is where generative AI comes into play and enhances how findings are communicated. 

In what ways is Xapien specifically designed for due diligence?

Xapien outperforms generative AI tools through five key differentiators. 

First, our disambiguation and entity resolution technology accurately identifies and differentiates between entities with the same name, ensuring you’re gathering information about the correct subject.

Second, Xapien incorporates structured and compliance-focused datasets, streamlining the compliance process directly within our platform. 

This all enables Xapien to address the high GDPR and data privacy risks associated with delivering background research on individuals and entities by minimising the need for extensive contextual input. This is our third differentiator. 

Fourth, we’ve integrated anti-hallucination technology to mitigate the risk of generating inaccurate information, a common issue with generative AI models. 

Finally, our pre-trained models provide consistency in answers, eliminating the need for repeated question variations to extract complete responses. On top of that, Xapien evolves without complicating its user interface, unlike AI models like GPT which require users to learn new prompts. 

Let’s go into each of these differentiators in more detail. What is disambiguation?

Disambiguation plays a key role in due diligence research. It’s what allows for the precise identification of individuals or organisations among those with identical or similar names. This process is essential in filtering out irrelevant or incorrect information, saving substantial time that manual researchers would otherwise spend sifting through false positives.

The process of disambiguation involves analysing context and available details about a subject to distinguish between entities with similar names. Traditionally, this manual process depends on the researcher’s ability to discern relevant “signals” from the noise, such as associations with specific locations, organisations, or known associates. This human analysis relies on semantic understanding to differentiate, for example, one “Joe Bloggs” from another based on distinct reference points like employment history, residence, age, and other personal identifiers.

Xapien enhances this process using sophisticated algorithms and natural language processing technology. It automates the search, collection, analysis, and differentiation of vast quantities of data, potentially unrelated initially, but connected to the name in question. The system organises this data into what’s referred to as a knowledge graph. Within this graph, clusters of signals form unique ‘personas’ for individuals sharing the same name, populated with assertions and insights drawn from a myriad of articles, corporate records, and other publicly accessible sources.

Think of it as a solar system of data points, where each piece of information forms connections that help identify whether certain personas represent the same individual. Starting with the initial context the user provides, Xapien builds upon this foundation to accurately identify the correct subject. It creates connections between each “Joe Bloggs” when it has identified enough connecting context to determine they’re the same person. 

Once the correct individual is pinpointed, the system consolidates relevant information into their profile while eliminating unrelated data clusters, ensuring the final report exclusively contains verified details about the “Joe Blogs” you’re searching for.

Generative AI models, in contrast, struggle to distinguish between individuals with similar or identical names. While these models can reference their sources, they lack the necessary functionality for disambiguation.

Consider this real example. There are two people named Manjit Singh. One is a convicted money launderer, responsible for laundering over 15 million dollars from NatWest. The other works in threat intelligence at NatWest, focusing on preventing money laundering. Despite their vastly different roles, we have examples of generative AI tools (that I won’t name) incorrectly identifying them as the same person. They fail to differentiate between them based on the information they have. Xapien doesn’t.

That’s pretty important. Unless you’re a criminal, or he’s wildly reformed, you wouldn’t want to work one of them but you would for the other.

Without access to sophisticated disambiguation or entity resolution technology and varied datasets (more on these below), LLMs would struggle to distinguish this. Their main function is to produce coherent text, not to navigate the complexities of accurately identifying separate individuals.

We also talk about how we integrate structured and compliance-focused datasets. Why is that important?

While we collect information from the entire indexed internet, including blog and media content web searches as part of our data search and collection process, our approach goes far beyond standard search engine results. Xapien integrates data from a wider array of structured and unstructured data sources, including AML screening data sets, and corporate records, which generative AI tools don’t have access to. This enables Xapien to find and match things like dates of birth or company registration numbers from company filings with information from blogs. 

This is fundamental for pinpointing the person you want to look for, and grounding unverifiable information from the internet in concrete, government-approved records.   

I like to illustrate this by showing what happens when you search for a man called Michael Doyle of MJD Sea Expeditions using Xapien compared to when you search for him only with a web-based search (which is what GPT or Gemini can do). 

Traditional search methods, like Google or company databases, produce overwhelming results due to the commonality of the name and lack of specific filters. You’ll need to sift through numerous unrelated records.

Xapien identifies Michael Doyle accurately, confirming details like his middle name and date of birth, but also reveals critical information that search engines miss. This includes his presence on the Guernsey disqualified director list, associations with criminal activities, and a conviction related to money laundering involving his wife. Xapien also highlights Doyle’s extensive directorships across multiple countries, a detailed overview of his professional engagements, and previously undiscovered media articles.

The inclusion of Anti-Money Laundering (AML) screening data and compliance datasets underscores Xapien’s suitability for due diligence. Accessing these specific sources is a fundamental part of any due diligence procedure. By compiling AML risks, corporate information, and detailed insights from web and media content into a single report, we significantly reduce the time analysts spend on research. What traditionally might take hours to compile and analyse can be accomplished in just 10 minutes.

How does that help with GDPR and privacy?

Xapien’s ability to disambiguate makes its searches GDPR-compliant.

A fundamental challenge with using AI tools like GPT for due diligence is the significant amount of context and data they require to function accurately. While it’s theoretically possible to fine-tune these tools to focus on the specific individual or entity you’re researching, you’d have to give them a huge amount of detailed context upfront. This is where the GDPR risk emerges.

GDPR, and similar privacy regulations, are predicated on minimising the amount of personal data collected and processed. When you’re required to feed an AI tool a comprehensive dataset to ensure it doesn’t confuse the subject of your research with someone else, you’re increasing the volume of personal data being provided to and processed by a third party. 

Xapien’s design respects privacy and compliance regulations. By requiring minimal sensitive input and automating data collection and analysis, we remove the GDPR and privacy risks. 

Generative AI famously encounters an issue known as “hallucination”. How do we prevent it? 

This is a natural limitation of models that predict text or images based on patterns in their training data, rather than verified sources. They can sometimes “imagine” details or connections that don’t exist, leading to potential misinformation or inaccuracies in their outputs.

In contrast, our anti-hallucination technology is designed specifically to address this challenge.  Xapien does not simply accept the output of the generative AI element that we use in report writing at face value. Instead, it scrutinises and cross-references these outputs with original data Xapien collected, analysed and read.

This involves an additional verification step, where every insight or piece of information generated by our platform is matched against known verifiable sources. It guarantees the content in our reports is not just relevant and coherent, but also accurate and directly linked to its original source. 

Check for yourself – you can trace every piece of content back to its original source simply by clicking on it.

So consistency is a key differentiator, right?

Xapien offers an unbiased and consistent approach to data analysis by combining advanced AI models with a structured methodology. Unlike search engines or generative AI tools like GPT, which can produce varied results based on how queries are crafted, Xapien automates the data extraction, analysis, and summarisation process. This ensures that all searches are conducted using the same criteria and methods, removing the variability and bias that can arise from individual users’ search habits.

As a result, Xapien always delivers a complete and objective view of the information, unaffected by user biases, or whatever “prompt” a user gives. 

There’s another thing to consider here. AI models like GPT are constantly evolving, which requires users to adapt by learning new prompts and techniques. Now, Xapien also undergoes continual development. However, a key difference is that our technology team is committed to maintaining Xapien’s user interface consistent and straightforward. This ensures that you can always obtain the information you need from Xapien through a single, simple search. 

Thanks, Shaun. From the top – what’s the elevator pitch on how we’re different to ChatGPT?

Xapien distinguishes itself from generative AI models in due diligence by disambiguation and anti-hallucination technology, and the integration of critical data sets such as AML screening and compliance information. With pre-trained models delivering consistent answers, Xapien guarantees unbiased, comprehensive information.

Unlike generative AI, which often requires users to navigate evolving prompts and methodologies, Xapien maintains a consistent, user-friendly interface that simplifies the due diligence process. Overall, our unique approach to compiling and verifying information ensures that users receive accurate, comprehensive reports tailored to their due diligence needs. It’s a more reliable and efficient tool.

Try Xapien for yourself by filling in the form below.

Monthly learnings and insights to your inbox

Xapien streamlines 
due diligence

Xapien's AI-powered research and due diligence tool goes faster than manual research and beyond traditional database checks. Fill in the form to the right to book in a 30 minute live demonstration.