Joseph, José or Giuseppe? The challenge of matching names across languages
Traditional research methods could be giving you a blindspot in your investigations into international names.
Research for due diligence, prospect or investigative purposes frequently involves people who are ‘international’ – with connections in more than one country. Because of this, their name may appear in different languages and scripts. To uncover all the information you will need to make a decision about your person, you’ll have to be able to find these different versions of their name.
There could be corporate data in Arabic for companies you were unaware of, or court records in Cyrillic that implicates your person in a legal case. Anything less than an accurate, timely match could mean that you miss crucial information in your decision-making. This can make the difference between a successful outcome – and a due diligence disaster.
But how can you find the name of your person, when their name will be presented in a language or script you are unfamiliar with?
This is where the process of ‘translingual name-matching’ comes in: translating and transliterating your person’s name in order to find references to them in foreign-language texts. In this guide, we’ll clarify what these terms mean, why they are important in your due diligence process, and how Xapien’s technology can help.
Using search engines to screen adverse media is time-consuming, inefficient and inconsistent. Relying on these methods is a drain on analyst time and leaves your organisation vulnerable to missing risks.
You need to be able to accurately name-match to:
- Find all the information about your person.
- Verify that the data is about the correct person.
This process is critically important when you are researching a person or a company, whether for due diligence, prospect or investigative purposes. It is also very challenging. Typically researchers need years of specialist expertise and extensive resources to do this work. And to complicate matters further, researchers must tailor their investigation to the specific languages, cultures and naming conventions that will be essential to accurately match your subject. The right tech can help, however, and make your international investigations more efficient, thorough and reliable.
Read on to find out:
- What needs to be done to reliably match names across languages, scripts and cultures.
- How the right tech can help.
- How your research, investigations and due diligence process can benefit from automating name matching.
To access all the data about an international subject, the manual researcher will need to parse, transliterate and translate the name of the subject into all relevant scripts and languages.
Step 1: Name parsing
Name parsing is identifying and labelling the different parts of a name.
This becomes difficult when dealing with unknown languages and naming conventions. It is also traditionally a challenge for tech solutions.
Though it is clear to a human researcher that, in the name ‘Dr Ivan P. Smith PhD’, ‘PhD’ is not the surname, this is not obvious to a computer! This is why we have to replicate the human process to create an accurate machine-learning system.
Step 2: Translation
Translation is converting a name from one language to another. This may also involve changing the script, in a way which preserves meaning.
Step 3: Transliteration
Transliteration is converting a name from one script to another, in a way which preserves the correct pronunciation.
The name needs to be translated and transliterated both before and during the research process. The researcher needs to run searches using the alternate name, and then verify name-matches within the data.
Translation and transliteration are typically done using technology like Google Translate. However, these tech solutions don’t do the whole job of parsing, translating and transliterating. Traditionally, it is impossible to rely on a single provider for multilingual name matching.
Step 4: Dealing with cultural naming conventions
Name-matching well means understanding different cultures’ naming conventions. The process must be tailored to the specific languages and countries relevant to your person.
For example, in many Slavic languages, surnames vary slightly according to gender. In Arabic names, middle names are often not other chosen names, but patronymics.
E.g. in the name ‘Sheik Ahmed bin Saeed Al Maktoum’, ‘bin Saeed’ is not a middle name, as ‘bin’ (and ‘ibn’) are prefixes that mean ‘son of’. Researchers therefore have to search for and match translated versions of this name (‘Ahmed son of Saeed Al Maktoum’) as well as transliterated phonetic versions.
Translation systems typically don’t have this culture-specific understanding. Without a team of specialists in naming conventions across different cultures, it is extremely difficult to do this accurately.
There are often several choices and possibilities for the transliteration, based on different transliteration standards and conventions. The popular Arabic feminine given name خديجة, for example, can be written in a number of different ways in Latin script, such as Khadija, Khadeeja or Khadijah.
Step 5: Fuzzy name matching
Fuzzy name matching is a machine learning technique that helps identify two elements of text or strings that are approximately similar but are not exactly the same.
These specific conventions mean that after the researcher has transliterated into all relevant scripts, they then must take account of these variations by using ‘fuzzy name matching’. That includes spelling variations, nicknames alternative name presentations (like surname / first name, or title / surname).
How the right tech can help
Xapien’s automated research platform, powered by AI and machine learning technology, has been trained to understand cultural naming customs. We are developing a system which can detect which language or culture a name is likely to derive from, translate and transliterate it into the relevant languages, and then run searches with those different versions.
The system can then scan the online data found about a subject, name matching against these different name versions, and taking it in all alternative name versions.
Using fuzzy name matching, our system gives all alternative name versions a score as to how confident it is in a name match.
Take a look at the scoring system used to name match the company name ‘Nintendo’, for example. From the web, the system finds a number of results matching the search term (on the left) to potential alternative forms (on the right). The score calculates the system’s confidence in the match. If the match has less than a 1 (which means 100% confidence), then the system can use other information from the text to better judge whether this alternative name is a match.
This means that our system is reading and finding information like a human analyst would, taking into account nicknames, alternative name spellings and using context to verify matches. But it’s doing this at an unmatchable scale and language capability.
While Xapien’s automated system is doing this work, it is also cross-referencing and verifying name matches against all the other data in our cloud-based knowledge graph. Because, of course, even if someone has the same name, then you still need to verify that it is your person!
Xapien goes far beyond searching and keyword matching. It reads millions of online corporate records, shareholder data, media and news articles, corporate websites, blogs, Wikileaks, and more. Just like a human – but with the capacity to read and translate data from over 130 languages. Our system can scour millions of data sources and records, whilst cross-referencing and disambiguating across its knowledge graph. All of this complex process is then presented in a concise report on your subject, generated within minutes.
How you can benefit
At Xapien, we have automated the way that human researchers understand names, nicknames, and cultural variations. Our system uses advanced AI to tackle the problem in the same way as a team of expert researchers would.
Think of the solution as your own team of hundreds of dedicated expert researchers, available on demand, all day every day.
With the capacity to search, translate and transliterate in over 130 languages, Xapien can do the work of 100 analysts in minutes.
AI insights, straight to your inbox
Search engines are great but they are only the starting point. Finding, reading and condensing the full picture is slow, hard, and painstaking work. Xapien can help.