Commit 19b907be authored by David Peter's avatar David Peter
Browse files

Update abstract.tex

parent da32f25e
\addcontentsline{toc}{chapter}{Abstract (English)}
\begin{center}\Large\bfseries Abstract (English)\end{center}\vspace*{1cm}\noindent
This thesis explores different methods for designing \glspl{cnn} for \gls{kws} in limited resource environments. Our goal is to maximize the classification accuracy while minimizing the memory requirements as well as the number of \gls{madds} per forward pass. To achieve our goals, we first employ a differentiable \gls{nas} approach to optimize the structure of \glspl{cnn}. After a suitable \gls{kws} model is found with \gls{nas}, we conduct quantization of weights and activations to reduce the memory requirements even further. For quantization, we compare fixed bitwidth quantization and trained bitwidth quantization. Then, we perform \gls{nas} to optimize the structure of end-to-end \gls{kws} models. End-to-end models perform classification directly on the raw audio waveforms, skipping the extraction of hand-crafted speech features such as \gls{mfcc}. We compare our models using \glspl{mfcc} to end-to-end models in terms of accuracy, memory requirements and number of \gls{madds} per forward pass. We also show that multi-exit models provide a lot of flexibility for \gls{kws} systems, allowing us to interrupt the forward pass early if necessary. All experiments are conducted on the \gls{gsc} dataset, a popular dataset for evaluating the performance of \gls{kws} applications.
This thesis explores different methods for designing \glspl{cnn} for \gls{kws} in limited resource environments. Our goal is to maximize the classification accuracy while minimizing the memory requirements as well as the number of \gls{madds} per forward pass. To achieve this goal, we first employ a differentiable \gls{nas} approach to optimize the structure of \glspl{cnn}. After a suitable \gls{kws} model is found with \gls{nas}, we conduct quantization of weights and activations to reduce the memory requirements even further. For quantization, we compare fixed bitwidth quantization and trained bitwidth quantization. Then, we perform \gls{nas} again to optimize the structure of end-to-end \gls{kws} models. End-to-end models perform classification directly on the raw audio waveforms, skipping the extraction of hand-crafted speech features such as \gls{mfcc}. We compare our models using \glspl{mfcc} to end-to-end models in terms of accuracy, memory requirements and number of \gls{madds} per forward pass. We also show that multi-exit models provide a lot of flexibility for \gls{kws} systems, allowing us to interrupt the forward pass early if necessary. All experiments are conducted on the \gls{gsc} dataset, a popular dataset for evaluating the classification accuracy of \gls{kws} applications.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment