### Change notation of loss

parent 26627881
No preview for this file type
 ... ... @@ -5,19 +5,19 @@ In Section~\ref{sec:dnn:over_underfitting} we discussed the concept of over- and \newsubsubsection{Mean Squared Error}{dnn:loss:mse} The \gls{mse} is the average of squared differences between the predicted and the real targets across all $N$ samples in the dataset. The \gls{mse} is most commonly used for regression tasks and depends on the model parameters $\vm{\theta}$. The \gls{mse} for a single sample is defined as \begin{equation} l_{\mathrm{MSE}}(\vm{\theta}) = \frac{1}{2} || f(\vm{x}_i; \vm{\theta}) - \vm{y}_i ||^2 l_{\mathrm{MSE}}(f(\vm{x}_i; \vm{\theta}), \vm{y}_i) = \frac{1}{2} || f(\vm{x}_i; \vm{\theta}) - \vm{y}_i ||^2 \end{equation} where $\vm{y}_i$ is the $i$-th one-hot encoded target and $f(\vm{x}_i; \vm{\theta})$ is the model prediction for the $i$-th input $\vm{x}_i$. The \gls{mse} over all samples is simply the average over all per-sample losses \begin{equation} \mathcal{L}_{\mathrm{MSE}}(\vm{\theta}) = \frac{1}{N} \sum_{i=1}^N l_{\mathrm{MSE}}(\vm{\theta}). \mathcal{L}_{\mathrm{MSE}}(\vm{\theta}) = \frac{1}{N} \sum_{i=1}^N l_{\mathrm{MSE}}(f(\vm{x}_i; \vm{\theta}), \vm{y}_i). \end{equation} \newsubsubsection{Cross Entropy Loss}{dnn:loss:ce} For classification tasks, a different loss functions has to be used. Typically, the choice for classification tasks with $C$ classes is the cross entropy loss. The cross entropy loss for a single sample is defined as \begin{equation} l_{\mathrm{CE}}(\vm{\theta}) = -\sum_{j=1}^C y_{ij} \log{f(\vm{x}_i; \vm{\theta})_{j}} l_{\mathrm{CE}}(f(\vm{x}_i; \vm{\theta}), \vm{y}_i) = -\sum_{j=1}^C y_{ij} \log{f(\vm{x}_i; \vm{\theta})_{j}} \end{equation} where $y_{ij}$ is $j$-th element of the $i$-th one-hot encoded target and $f(\vm{x}_i; \vm{\theta})_j$ is the $j$-th element of the model prediction for the $i$-th input $\vm{x}_i$. The cross entropy loss over all samples is simply the average over all per-sample losses \begin{equation} \mathcal{L}_{\mathrm{CE}}(\vm{\theta}) = \frac{1}{N} \sum_{i=1}^N l_{\mathrm{CE}}(\vm{\theta}). \mathcal{L}_{\mathrm{CE}}(\vm{\theta}) = \frac{1}{N} \sum_{i=1}^N l_{\mathrm{CE}}(f(\vm{x}_i; \vm{\theta}), \vm{y}_i). \end{equation} \ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment