Commit 818aad60 authored by David Peter's avatar David Peter
Browse files

Intro

parent 131cd7b5
No preview for this file type
......@@ -2,7 +2,7 @@
% **************************************************************************************************
\newsection{Motivation}{intro:motivation}
%Motivation: Was ist KWS? Warum braucht man es? Warum muss es resourcenschonend sein?\\
\glspl{iva} like Google's Assistant, Apple's Siri and Amazon's Alexa have gained an lot of popularity in recent years. \glspl{iva} provide an alternative interface for human-computer interaction aside from the more traditional interfaces such as mouse, keyboard, touchscreen and display. \glspl{iva} are capable of interpreting human speech, follow spoken commands and reply via synthesized voices. Modern \glspl{iva} provide many functionalities including responses to questions, email management, to-do list management, calendar management, home automation control and media playback control. Part of the success of \glspl{iva} can be attributed to the rapid advancements in \gls{nlp} and \gls{ai}, both being key technologies in the development and the application of \glspl{iva}.
\glspl{iva} like Google's Assistant, Apple's Siri and Amazon's Alexa have gained a lot of popularity in recent years. \glspl{iva} provide an alternative interface for human-computer interaction in addition to the more traditional interfaces such as mouse, keyboard, touchscreen and display. \glspl{iva} are capable of interpreting human speech, follow spoken commands and reply via synthesized voices. Modern \glspl{iva} provide many functionalities including responses to questions, email management, to-do list management, calendar management, home automation control and media playback control. Part of the success of \glspl{iva} can be attributed to the rapid advancements in \gls{nlp} and \gls{ai}, both being key technologies in the development and the application of \glspl{iva}.
For \glspl{iva} to fulfil requests, a complex pipeline of different technologies is necessary. First, \gls{asr} is employed to convert a spoken command into a text transcription. Then, natural language understanding is used to interpret and to extract the intention of the user from the text transcription. A dialogue manager then produces a response to the spoken command. Finally, text to speech is used to convert the response from the dialogue manager to spoken words that are then supplied back to the user.
......@@ -17,12 +17,12 @@ A common solution is to run a low-cost \gls{kws} system that is listening perman
\newsection{Scope of this Thesis}{intro:scope}
%Scope: Was wird abgehandelt? Contributions?\\
In this thesis, we will focus on \gls{kws} as one crucial aspect of the \gls{iva} pipeline. Recently, \glspl{dnn} have become the state-of-the-art in \gls{kws}, slowly replacing the more traditional \glspl{hmm}. While \glspl{hmm} achieve reasonable performances, they are hard to train and computationally expensive at runtime. Because of the limitations of \glspl{hmm}, we opted to focus solely on \gls{dnn} based \gls{kws} models. In particular, we will we focus on \glspl{cnn} for \gls{kws}.
In this thesis, we will focus on \gls{kws} as one crucial aspect of the \gls{iva} pipeline. Recently, \glspl{dnn} have become the state-of-the-art in \gls{kws}, slowly replacing the more traditional \glspl{hmm}. While \glspl{hmm} achieve reasonable performances, they require an iterative algorithm to train and are inferior in terms of performance. Because of the limitations of \glspl{hmm}, we opted to focus solely on \gls{dnn} based \gls{kws} models. In particular, we will we focus on \glspl{cnn} for \gls{kws}.
With \glspl{dnn}, it is possible to obtain resource efficient \gls{kws} models with competitive performances. Of course there is a tradeoff between the resource efficiency of a model and its performance. We will explore this tradeoff in depth using different methods from the literature including \gls{nas}, weight and activation quantization, end-to-end models and multi-exit models.
\gls{kws} will be performed on the \gls{gsc} dataset, a public dataset published by Google to enable the comparison of \gls{kws} models. The \gls{gsc} consists of 1-second long audio files of spoken words from many different speakers. Our models will be trained and evaluated on 10 keyword classes labeled \enquote{yes}, \enquote{no}, \enquote{up}, \enquote{down}, \enquote{left}, \enquote{right}, \enquote{on}, \enquote{off}, \enquote{stop}, \enquote{go}. Two additional classes are added called \enquote{unknown} and \enquote{silence}, where the \enquote{unknown} class is a collection of unrelated keywords and the \enquote{silence} class includes no keywords and only background noise.
\Gls{kws} will be performed on the \gls{gsc} dataset \cite{Warden2018}, a public dataset published by Google to enable the comparison of \gls{kws} models. The \gls{gsc} consists of 1-second long audio files of spoken words from many different speakers. Our models will be trained and evaluated on 10 keyword classes labeled \enquote{yes}, \enquote{no}, \enquote{up}, \enquote{down}, \enquote{left}, \enquote{right}, \enquote{on}, \enquote{off}, \enquote{stop}, \enquote{go}. Two additional classes are added called \enquote{unknown} and \enquote{silence}, where the \enquote{unknown} class is a collection of unrelated keywords and the \enquote{silence} class includes no keywords and only background noise.
\newsection{Outline}{intro:outline}
%Outline: Thesis outline. Kapitel mit Sätzen beschreiben.
The outline of this thesis is as follows: Chapter 2 provides the theoretical background to \glspl{dnn}. This chapter explains the various learning approaches, capacity, over- and underfitting, \glspl{mlp}, \glspl{cnn} and the training of \glspl{dnn}. Chapter 3 provides the theoretical background for resource efficient \gls{kws}. This chapter explains resource efficient convolutional layers, \gls{nas}, weight and activation quantization, end-to-end models and multi-exit models. The \gls{gsc} dataset, data augmentation and feature extraction is explained in Chapter 4. The experimental results of this thesis are presented and discussed in Chapter 5. Finally, Chapter 6 provides the conclusion to this thesis.
The outline of this thesis is as follows: Chapter 2 provides the theoretical background to \glspl{dnn}. This chapter introduces various learning approaches, capacity, over- and underfitting, \glspl{mlp}, \glspl{cnn} and the training of \glspl{dnn}. Chapter 3 provides the theoretical background for resource efficient \gls{kws}. This chapter presents resource efficient convolutional layers, \gls{nas}, weight and activation quantization, end-to-end models and multi-exit models. The \gls{gsc} dataset, data augmentation and feature extraction is explained in detail in Chapter 4. The experimental results of this thesis are presented and discussed in Chapter 5. Finally, Chapter 6 provides the conclusion to this thesis.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment