Japanese ID Verification

Japan hosts one of the largest internet savvy population of the world with a very deep penetration of online financial products and business services. An effective Japanese identity verification service is critical if your customers are from the Japanese market.

For businesses not based out of Japan and trying to serve Japanese customers it becomes imperative to understand cultural nuances specific to Japan to be able to provide an effective service.

At Xesario we have the most accurate identity verification system in the industry for Japanese language documents. Our internal tests show that our accuracy is as much as 60% better than our next nearest competitor.

Identity Verification of Japanese customers is one of the services which is impacted by these cultural nuances. For one thing, Japanese does not have a single script like most of the other countries, and instead has three different scripts which are all used in different identity documents. Secondly, the Japanese year naming convention also differs from the format used across the world and is instead based on the accession date of the Japanese emperors.

Businesses doing automatic or even manual identity verification of Japanese customers would be well served to understand these nuances and use the knowledge to provide an effective service.

In this article we look at the different Japanese scripts in detail which will help you identify the specific script while verifying official documents. We also explain the Japanese Date conventions and show how to map it to the format used in the rest of the world. Finally we show you some sample Japanese ID documents and outline the different scripts and date conventions used in them.

Japanese Script

Japanese script that you will find in different identity documents can comprise of three different types of script.

  1. Kana - which is further composed of Hiragana and Katakana
  2. Kanji
  3. Romaji

Kana

Kana is a from of syllabic writing and is itself composed of two different groups: Hiragana and Katakana. Unlike Kanji, Kana is a purely phonetic system. Each syllable represents a phoneme like ‘ta’ or ‘ti’ and do not represent any meaning in of themselves. Kanji on the other hand is ideographic writing - each character represents a real world entity or a concept.

Hiragana is typically used for words which do not have a representation in Kanji and also occasionally to join words through conjugations or prepositions.

Examples of Hiragana:

  1. る in 見る (miru, "see")
  2. い in 白い (shiroi, "white"),
  3. た in 見た (mita, "saw")
  4. かった in 白かった (shirokatta, "was white").
  5. これはれいぶんしょうです

Hiragana is considered feminine because of its curved appearance. This is different from Katakana which appears relatively straight.

Katakana is mostly used for words which are not originally Japanese and are loaned from foreign cultures. However, like all languages, there are exceptions to these rules and you might find Katakana used in places outside this scope.

Examples of Katakana:

  1. コンピュータ (konpyūta, "computer")
  2. ロンドン (Rondon, "London").
  3. コレハレイブンショウデス

Katakana is considered masculine because of its relatively straight appearance. Katakana is also problematic from an automatic identity verification perspective because most of the commercial OCR software out there make a lot of mistakes in identifying the correct Katakana character because a lot of them look similar. Example ツシソンノ

Kanji

Kanji on the other hand is what is called an ideographic writing system. Kanji literally means “Chinese Character” and most Kanji are based on the Chinese Hanzi script.

Each character in Kanji would represent an entity or a concept. However sometimes each character could represent multiple entities or concepts and the correct one has to be inferred from the context - specifically the word that Kanji is part of. So for example 上 and 下 can be read in about 8 different ways based on where they appear in the word.

Occasionally, a Kanji character can represent a complete word so it can be used independently. Kanji is also used for most of the Japanese personal names and places. However some names might be written in Kana and sometimes with a combination of Kana and Kanji.

Examples of Kanji:

  1. 水 - water
  2. 朝 - morning
  3. 川 - kawa, “river”
  4. 学校 - gakkō, "school"
  5. 田中 - Tanaka - Name of a person
  6. 東京 - Tokyo
  7. これは例文章です

From a document verification perspective most of the Japanese document which are meant to be used inside Japan (broadly everything except for Japanese Passports) will have the persons name mentioned in Kanji. However, sometimes the name will be mentioned in Kana or a mixture of Kana and Kanji.

Automatic identification of Japanese users using OCR software is problematic as its hard to accurately read Japanese documents. This is because a lot of the Kanji characters are very detailed and getting the details right is critical for a successful automatic identification.

For example, most of the OCR softwares can easily mistake the character 東 for the character 章 . If you look closely, you can see that these are two different characters, but it's hard for OCR programs to distinguish between the two - more so when the document is uploaded under unfavourable lighting conditions.

Romaji

Romaji is basically using the Latin script to write the Japanese Language. Romaji is typically used in places where Japanese text is targeted at a non-Japanese reading audience. As far as identity documents are concerned, you will find Japanese names in Passports written in Romaji. Romaji is commonly used to input Japanese into computers and word processing applications.

Example of Romaji:

  1. “Tairon” is the romaji for “タイロン“ which is written in Katakana and pronounced as ta-i-ro-n