FormosanBank

Short description:
FormosanBank is a large-scale, machine-readable corpus and tooling ecosystem for Taiwan’s Indigenous Formosan languages—supporting research, education, and revitalization across 16 official languages with multimodal text–audio resources.


Overview

FormosanBank curates standardized, machine-actionable corpora for the Indigenous Formosan languages of Taiwan (part of the Austronesian family). The project aggregates, cleans, and structures multilingual text and audio into a consistent XML schema, enabling downstream tasks such as ASR/forced alignment, translation, lexicon building, and pedagogical content creation.


Quick links