AI components | Kratid

Reusable AI components

An open-source AI component is the base component of an application based on AI that all interested parties in the public or private sector are able to reuse without charge and to further develop depending on their own needs.

All the open-source AI components are available in eGovernment code repository and/or GitHub.

BÜROKRATT

Bürokratt is an interoperable network of chatbots on the websites of public authorities that allows people to obtain information from these authorities through a chat window as well as receive some more basic services. All developments are public and available on GitHub.

DATA ANONYMIZER

Data anonymizer recognizes named entities (e.g. the names, personal ID codes, dates and locations) from text. Also, the entities such as organizations, products, jobs, events, positions and financial info are recognized. As the next step, Data Anonymizes can replace one entity with another value from the same entity group (e.g. Tallinn is replaced with Tartu). The solution can be used by information system administrators that need to process personal data as well as by those organizations who have to publish data. Instead of having to check documents manually, it can be done automatically with the help of Data Anonymizer.

The Named Entity Recognition (NER) corpora developed by the University of Tartu can be used or entirely new models can be trained using institution-specific datasets. Check out the demo. All development work is public and can be found on GitHub.

NEUROTÕLGE

Neurotõlge, a machine translation engine developed by the Natural Language Processing research group at the University of Tartu, is freely available for reuse and can be further developed by those interested. Neurotõlge supports 7 languages (Estonian/Latvian/Lithuanian/English/Finish/German/Russian), whereas all 42 translation directions fit into a single neural model. The solution does not require the user to separately select a source language – the system does this itself. The user only has to select the target language. In addition, it is also possible to choose the style of translation, depending on whether conversational language or more formal styles should be used. The translation engine is also able to correct the style in the same language and to correct spelling mistakes.
It is possible to place the translation engine solution into an environment chosen by the user, which thereby also enables the translation of documents intended for internal use. The translation engine operates online, where it can be used directly as a demo integrated with translation frameworks as well as used through an API. The translation engine solution is available in the eGovernment code repository.

NEUROKÕNE

This is a prototype of Estonian speech synthesis based on neural networks that has been developed by the Natural Language Processing research group at the University of Tartu, and is trained on the corpus of Estonian news. Speech synthesis can currently imitate the voices of six different speakers. The project is still in the development stage and is far from perfect; however, the neural network-based speech synthesis sounds more natural than earlier methods. The strengths of the speech model include the natural sound and intonation of speech, and the pronunciation of numbers, symbols and abbreviations.
More information can be found at www.neurokone.ee
The source code with installation instructions is available on GitHub.

TEXTA TOOLKIT

The text analytics tool created by Texta OÜ has already been used by several institutions to streamline their work processes and automate routine activities. For example, The Ministry of Education and Research performed a document management audit using Texta Toolkit. The aim of the project was to identify the documents that had been published without authorization (e.g. internal documents, personal data etc.). In collaboration with the Centre of Registers and Information Systems, the Ministry of Justice removed, using Texta, personal data from nearly 80,000 court decisions involving outdated court sentences, and republished the decisions in the Court Information System.
Texta Toolkit is available on GitHub.

KIIRKIRJUTAJA

Speech recognition is a technology that converts speech into text. Speech recognition allows, for example, dictating of documents, transcribing voice and video recordings, and communicating with computers and devices using speech. Estonian speech recognition has reached real-world applications, used for example by radiologists at the North-Estonian Regional Hospital, as well as by several Estonian media monitoring companies for automatic transcription of radio and TV programmes. Speech recognition recently also made its way into the session hall of Riigikogu (Parliament of Estonia) where it is used to automatically prepare verbatim reports of the sittings.
Developed in the TalTech Laboratory of Language Technology, Kiirkirjutaja is available free of charge for everyone. The system is being developed by a team led by Tanel Alumäe. The source code with installation instructions is available on GitHub.