Margin-based Losses

This section lists all the subtypes of MarginLoss that are implemented in this package.

ZeroOneLoss

class ZeroOneLoss

The classical classification loss. It penalizes every missclassified observation with a loss of \(1\) while every correctly classified observation has a loss of \(0\). It is not convex nor continuous and thus seldomly used directly. Instead one usually works with some classification-calibrated surrogate loss, such as one of those listed below.

Lossfunction Derivative
\[\begin{split}L(a) = \begin{cases} 1 & \quad \text{if } a < 0 \\ 0 & \quad \text{otherwise}\\ \end{cases}\end{split}\]
\[L'(a) = 0\]

PerceptronLoss

class PerceptronLoss

The perceptron loss linearly penalizes every prediction where the resulting agreement \(a < 0\). It is Lipschitz continuous and convex, but not strictly convex.

Lossfunction Derivative
\[L(a) = \max \{ 0, - a \}\]
\[\begin{split}L'(a) = \begin{cases} -1 & \quad \text{if } a < 0 \\ 0 & \quad \text{otherwise}\\ \end{cases}\end{split}\]

L1HingeLoss

class L1HingeLoss

The hinge loss linearly penalizes every predicition where the resulting agreement \(a < 1\) . It is Lipschitz continuous and convex, but not strictly convex.

Lossfunction Derivative
\[L(a) = \max \{ 0, 1 - a \}\]
\[\begin{split}L'(a) = \begin{cases} -1 & \quad \text{if } a < 1 \\ 0 & \quad \text{otherwise}\\ \end{cases}\end{split}\]

SmoothedL1HingeLoss

class SmoothedL1HingeLoss
γ

As the name suggests a smoothed version of the L1 hinge loss. It is Lipschitz continuous and convex, but not strictly convex.

Lossfunction Derivative
\[\begin{split}L(a) = \begin{cases} \frac{1}{2 \gamma} \cdot \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge 1 - \gamma \\ 1 - \frac{\gamma}{2} - a & \quad \text{otherwise}\\ \end{cases}\end{split}\]
\[\begin{split}L'(a) = \begin{cases} - \frac{1}{\gamma} \cdot \max \{ 0, 1 - a \} & \quad \text{if } a \ge 1 - \gamma \\ - 1 & \quad \text{otherwise}\\ \end{cases}\end{split}\]

ModifiedHuberLoss

class ModifiedHuberLoss

A special (4 times scaled) case of the SmoothedL1HingeLoss with \(\gamma = 2\). It is Lipschitz continuous and convex, but not strictly convex.

Lossfunction Derivative
\[\begin{split}L(a) = \begin{cases} \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge -1 \\ - 4 a & \quad \text{otherwise}\\ \end{cases}\end{split}\]
\[\begin{split}L'(a) = \begin{cases} - 2 \cdot \max \{ 0, 1 - a \} & \quad \text{if } a \ge -1 \\ - 4 & \quad \text{otherwise}\\ \end{cases}\end{split}\]

DWDMarginLoss

class DWDMarginLoss
q

The distance weighted discrimination margin loss. A differentiable generalization of the L1 hinge loss that is different than the SmoothedL1HingeLoss

Lossfunction Derivative
\[\begin{split}L(a) = \begin{cases} 1 - a & \quad \text{if } a \le \frac{q}{q+1} \\ \frac{1}{a^q} \frac{q^q}{(q+1)^{q+1}} & \quad \text{otherwise}\\ \end{cases}\end{split}\]
\[\begin{split}L'(a) = \begin{cases} - 1 & \quad \text{if } a \le \frac{q}{q+1} \\ - \frac{1}{a^{q+1}} \left( \frac{q}{q+1} \right)^{q+1} & \quad \text{otherwise}\\ \end{cases}\end{split}\]

L2MarginLoss

class L2MarginLoss

The margin-based least-squares loss for classification, which quadratically penalizes every prediction where \(a \ne 1\). It is locally Lipschitz continuous and strongly convex.

Lossfunction Derivative
\[L(a) = {\left( 1 - a \right)}^2\]
\[L'(a) = 2 \left( a - 1 \right)\]

L2HingeLoss

class L2HingeLoss

The truncated version of the least-squares loss. It quadratically penalizes every predicition where the resulting agreement \(a < 1\) . It is locally Lipschitz continuous and convex, but not strictly convex.

Lossfunction Derivative
\[L(a) = \max \{ 0, 1 - a \} ^2\]
\[\begin{split}L'(a) = \begin{cases} 2 \left( a - 1 \right) & \quad \text{if } a < 1 \\ 0 & \quad \text{otherwise}\\ \end{cases}\end{split}\]

LogitMarginLoss

class LogitMarginLoss

The margin version of the logistic loss. It is infinitely many times differentiable, strictly convex, and Lipschitz continuous.

Lossfunction Derivative
\[L(a) = \ln (1 + e^{-a})\]
\[L'(a) = - \frac{1}{1 + e^a}\]

ExpLoss

class ExpLoss

The margin-based exponential Loss used for classification, which penalizes every prediction exponentially. It is infinitely many times differentiable, locally Lipschitz continuous and strictly convex, but not clipable.

Lossfunction Derivative
\[L(a) = e^{-a}\]
\[L'(a) = - e^{-a}\]

SigmoidLoss

class SigmoidLoss

The so called sigmoid loss is a continuous margin-base loss which penalizes every prediction with a loss within in the range (0,2). It is infinitely many times differentiable, Lipschitz continuous but nonconvex.

Lossfunction Derivative
\[L(a) = 1 - \tanh(a)\]
\[L'(a) = - \textrm{sech}^2 (a)\]