Current language-centric AI takes large language models (LLMs) as its most prominent inner-wheel component. It is, however, widely acknowledged that today’s most successful and utilized LLMs are, in essence, English-dominated and monocultural. They therefore tend to reinforce and homogenize languages and cultural norms and values and thereby implicitly impact the deeply rooted and highly profiled and appreciated societal values of the Nordic and Baltic regions, such as equality, democracy, and trust. This reproduction and amplification of societal and cultural bias at scale highly challenge the mission – as adopted in many governmental initiatives – of implementing AI in society in a transparent, customized, and responsible way.
Our project addresses this challenge by adapting current and future LLMs towards a more responsible coverage and functionality that encompass the linguistic and cultural diversity of our regions, and that are thereby more inclusive in relation to the societies in which they will be used. By establishing a strong and interdisciplinary consortium of selected leading Natural Language Processing (NLP) sections and language institutions across the Nordic and Baltic countries, we will draw on existing language and cultural resources in our regions and share best-practice among sister languages and cultures.
The project will create the following results:
- compile a number of open-source linguistic and cultural multi-parallel datasets of considerable size for Danish, Swedish, Bokmål, Nynorsk, Faroese, and Latvian, which will systematically draw on and make explicit the central aspects of the linguistic and cultural diversity of our regions,
- based on these data, considerably advance state-of-the-art methods for explaining, assessing, and aligning LLMs across languages and cultures, with particular focus on the linguistic idiosyncrasy and cultural heritage of our regions, and finally,
- assess and align a number of LLMs towards the Nordic and Baltic cultures and societies.
The cross-national project will facilitate synergy with existing and emerging governmental AI initiatives where the project partners already contribute as PIs or co-PIs. Knowledge-exchange will take place through best-practice and sharing of resources to improve and speed up the linguistic and cultural customization of LLMs to our respective countries.