Data Collection for Swiss Voice Assistant

Abstract

The understanding of spoken languages by machines is becoming increasingly important with the development of language assistants and computer-aided barrier-free communication. Previous research has focused on widely used languages with many resources available in spoken and written form. Most languages worldwide and all dialects do not have a standardized form of writing, and for these, we need different approaches. Our first research goal was to define the requirements for a platform that collects resources from languages with a wide variety of dialects in terms of grammar, vocabulary, pronunciation, and non-standard grammar. Secondly, we implemented such an application with the same flexibility and modularity as the languages it can collect; it also provides gamification elements to motivate the volunteers who donate their voice and writing. Thirdly, we tested this platform and gamification elements in a pilot study with a Swiss-German audience. Gamification resulted in higher user retention, four times more data generated, and no data quality decrease.


Viturin Züst

Bachelor's Thesis

Status:

Completed

JavaScript has been disabled in your browser