Q&A

Questions about the dialect comparison tool

A short guide to what the tool compares, where the data comes from, and how to read the AI-generated summaries.

What does this tool compare?

The app compares characters that appear in both selected dialects. For those matched characters, it checks how one sound in Dialect A appears in Dialect B.

聲母: compares the initial consonant.
韻母: compares the final/rime part.
Full syllable: compares the whole pronunciation string.

Where does the source data come from?

The dialect pronunciation data used here comes from 中央研究院小學堂. The CSV files in this project are normalized from that source data for comparison and lookup inside this app.

The source data is provided under a Creative Commons (CC) license. When reusing the data outside this app, follow the license terms and cite the original source.

How should I read the percentage?

The percentage is a data proportion. It means: among matched characters where Dialect A has the requested sound, this share maps to a candidate sound in Dialect B.

If the result says 184 / 191, about 96%, it means 184 matched characters out of 191 show that correspondence in the dataset.

It is not model confidence, historical certainty, or proof that every word behaves that way.

How does the AI agent get the data?

The AI agent does not read the CSV files directly. It asks the server to run comparison tools, receives structured JSON results, then summarizes those results.

You ask a natural-language question.
Gemini chooses a tool such as compare_initial.
The server runs the local comparison function.
Gemini summarizes the returned counts and examples.

What if I do not type the exact dialect name?

The agent tries to match casual names and spelling variants to the dataset values before running a comparison.

香港話 becomes 香港-市區.
台山話 becomes 台山-台城.
ng can become ŋ.
oeng can become œŋ.

What can I ask?

香港話的 f 聲母在台山話有甚麼表現？
香港話的 oeng 韻母在廣州話有甚麼表現？
比較樂昌和仁化的 t 聲母，說明主要對應和例字。
曲江馬壩和仁化的 toŋ55 音節有甚麼對應？

What are the limitations?

Results depend on the rows available in the normalized dataset. A low count or no result can mean the sound is absent, the spelling does not match, or too few characters overlap between the two dialects.

Alternate mappings can reflect lexical exceptions, multiple readings, sound splits, or source-data variation.