Supported Languages =================== Jiwar supports a wide range of languages, both with built-in corpora and through custom corpus support. Languages with Built-in Corpora ------------------------------- These languages have pre-loaded corpora and can be used immediately with Jiwar: .. list-table:: :header-rows: 1 :widths: 5 10 20 30 * - N - Code - Language - Supported Writing Script * - 1 - af - Afrikaans - Latin * - 2 - ar - Arabic - Fully diacritized Arabic * - 3 - bg - Bulgarian - Cyrillic * - 4 - bs - Bosnian - Cyrillic, Latin * - 5 - ca - Catalan - Latin * - 6 - cs - Czech - Latin * - 7 - de - German - Latin * - 8 - el - Greek - Greek * - 9 - en-gb - English (GB) - Latin * - 10 - en-us - English (US) - Latin * - 11 - eo - Esperanto - Latin * - 12 - es - Spanish - Latin * - 13 - et - Estonian - Latin * - 14 - eu - Basque - Latin * - 15 - fa - Persian - Perso-Arabic * - 16 - fi - Finnish - Latin * - 17 - fr - French - Latin * - 18 - hr - Croatian - Latin * - 19 - hu - Hungarian - Latin * - 20 - hy - Armenian - Armenian * - 21 - id - Indonesian - Latin * - 22 - it - Italian - Latin * - 23 - kk - Kazakh - Cyrillic * - 24 - ko - Korean - Hangul * - 25 - lt - Lithuanian - Latin * - 26 - lv - Latvian - Latin * - 27 - mk - Macedonian - Cyrillic * - 28 - ms - Malay - Latin * - 29 - nl - Dutch - Latin * - 30 - no - Norwegian - Latin * - 31 - pl - Polish - Latin * - 32 - pt - Portuguese - Latin * - 33 - ro - Romanian - Latin * - 34 - ru - Russian - Cyrillic * - 35 - sk - Slovak - Latin * - 36 - sq - Albanian - Latin * - 37 - sr - Serbian - Cyrillic * - 38 - sv - Swedish - Latin * - 39 - tr - Turkish - Latin * - 40 - uk - Ukrainian - Cyrillic * - 41 - ur - Urdu - Perso-Arabic Languages Requiring Custom Corpora ---------------------------------- These languages are supported by Jiwar but require a custom corpus: .. list-table:: :header-rows: 1 :widths: 20 80 * - Code - Language * - am - Amharic * - an - Aragonese * - as - Assamese * - az - Azerbaijani * - ba - Bashkir * - bn - Bengali * - bpy - Bishnupriya Manipuri * - chr - Cherokee * - cmn - Mandarin Chinese * - cv - Chuvash * - en-029 - Caribbean English * - en-gb-x-gbclan - Lancastrian English * - en-gb-x-gbcwmd - West Midlands English * - en-gb-x-rp - Received Pronunciation English * - es-419 - Latin American Spanish * - fa-latn - Persian (Latin script) * - fr-be - Belgian French * - fr-ch - Swiss French * - ga - Irish Gaelic * - gd - Scottish Gaelic * - gn - Guarani * - grc - Ancient Greek * - gu - Gujarati * - hak - Hakka Chinese * - haw - Hawaiian * - he - Hebrew * - hi - Hindi * - ht - Haitian Creole * - hyw - Western Armenian * - ia - Interlingua * - io - Ido * - is - Icelandic * - ja - Japanese * - jbo - Lojban * - ka - Georgian * - kl - Greenlandic * - kn - Kannada * - kok - Konkani * - ku - Kurdish * - ky - Kyrgyz * - la - Latin * - lb - Luxembourgish * - lfn - Lingua Franca Nova * - ltg - Latgalian * - mi - Maori * - ml - Malayalam * - mr - Marathi * - mt - Maltese * - my - Burmese * - nci - Classical Nahuatl * - ne - Nepali * - nb - Norwegian Bokmål * - nog - Nogai * - om - Oromo * - or - Oriya * - pa - Punjabi * - pap - Papiamento * - piqd - Klingon * - pt-br - Brazilian Portuguese * - qdb - Lang Belta * - qu - Quechua * - quc - K'iche' * - qya - Quenya * - ru-lv - Latvian Russian * - sd - Sindhi * - shn - Shan * - si - Sinhala * - sjn - Sindarin * - smj - Lule Sami * - sw - Swahili * - ta - Tamil * - te - Telugu * - th - Thai * - tk - Turkmen * - tl - Tagalog * - tn - Setswana * - tt - Tatar * - ug - Uyghur * - uz - Uzbek * - vi-vn-x-central - Central Vietnamese * - vi-vn-x-south - Southern Vietnamese * - yue - Cantonese For instructions on creating and using custom corpora, please refer to the :doc:`custom_corpus` page.