Working with Losses

Even though they are called loss “functions”, this package implements them as immutable types instead of true Julia functions. There are good reasons for that. For example it allows us to specify the properties of losse functions explicitly (e.g. isconvex(myloss)). It also makes for a more consistent API when it comes to computing the value or the derivative. Some loss functions even have additional parameters that need to be specified, such as the \(\epsilon\) in the case of the \(\epsilon\)-insensitive loss. Here, types allow for member variables to hide that information away from the method signatures.

In order to avoid potential confusions with true Julia functions, we will refer to “loss functions” as “losses” instead. The available losses share a common interface for the most part. This section will provide an overview of the basic functionality that is available for all the different types of losses. We will discuss how to create a loss, how to compute its value and derivative, and how to query its properties.

Instantiating a Loss

Losses are immutable types. As such, one has to instantiate one in order to work with it. For most losses, the constructors do not expect any parameters.

julia> L2DistLoss()
LossFunctions.LPDistLoss{2}()

julia> HingeLoss()
LossFunctions.L1HingeLoss()

We just said that we need to instantiate a loss in order to work with it. One could be inclined to belief, that it would be more memory-efficient to “pre-allocate” a loss when using it in more than one place.

julia> loss = L2DistLoss()
LossFunctions.LPDistLoss{2}()

julia> value(loss, 2, 3)
1

However, that is a common oversimplification. Because all losses are immutable types, they can live on the stack and thus do not come with a heap-allocation overhead.

Even more interesting in the example above, is that for such losses as L2DistLoss, which do not have any constructor parameters or member variables, there is no additional code executed at all. Such singletons are only used for dispatch and don’t even produce any additional code, which you can observe for yourself in the code below. As such they are zero-cost abstractions.

julia> v1(loss,t,y) = value(loss,t,y)

julia> v2(t,y) = value(L2DistLoss(),t,y)

julia> @code_llvm v1(loss, 2, 3)
define i64 @julia_v1_70944(i64, i64) #0 {
top:
  %2 = sub i64 %1, %0
  %3 = mul i64 %2, %2
  ret i64 %3
}

julia> @code_llvm v2(2, 3)
define i64 @julia_v2_70949(i64, i64) #0 {
top:
  %2 = sub i64 %1, %0
  %3 = mul i64 %2, %2
  ret i64 %3
}

On the other hand, some types of losses are actually more comparable to whole families of losses instead of just a single one. For example, the immutable type L1EpsilonInsLoss has a free parameter \(\epsilon\). Each concrete \(\epsilon\) results in a different concrete loss of the same family of epsilon-insensitive losses.

julia> L1EpsilonInsLoss(0.5)
LossFunctions.L1EpsilonInsLoss{Float64}(0.5)

julia> L1EpsilonInsLoss(1)
LossFunctions.L1EpsilonInsLoss{Float64}(1.0)

For such losses that do have parameters, it can make a slight difference to pre-instantiate a loss. While they will live on the stack, the constructor usually performs some assertions and conversion for the given parameter. This can come at a slight overhead. At the very least it will not produce the same exact code when pre-instantiated. Still, the fact that they are immutable makes them very efficient abstractions with little to no performance overhead, and zero memory allocations on the heap.

Computing the Values

The first thing we may want to do is compute the loss for some observation (singular). In fact, all losses are implemented on single observations under the hood. The core function to compute the value of a loss is value(). We will see throughout the documentation that this function allows for a lot of different method signatures to accomplish a variety of tasks.

value(loss, target, output) → Number

Computes the result for the loss-function denoted by the parameter loss. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn’t, you likely found a bug.

\[L : Y \times \mathbb{R} \rightarrow [0,\infty)\]
Parameters:
  • loss (SupervisedLoss) – The loss-function \(L\) we want to compute the value with.
  • target (Number) – The ground truth \(y \in Y\) of the observation.
  • output (Number) – The predicted output \(\hat{y} \in \mathbb{R}\) for the observation.
Returns:

The (non-negative) numeric result of the loss-function for the given parameters.

#               loss        y    ŷ
julia> value(L1DistLoss(), 1.0, 2.0)
1.0

julia> value(L1DistLoss(), 1, 2)
1

julia> value(L1HingeLoss(), -1, 2)
3

julia> value(L1HingeLoss(), -1f0, 2f0)
3.0f0

It may be interesting to note, that this function also supports broadcasting and all the syntax benefits that come with it. Thus, it is quite simple to make use of preallocated memory for storing the element-wise results.

julia> value.(L1DistLoss(), [1,2,3], [2,5,-2])
3-element Array{Int64,1}:
 1
 3
 5

julia> buffer = zeros(3); # preallocate a buffer

julia> buffer .= value.(L1DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
 1.0
 3.0
 5.0

Furthermore, with the loop fusion changes that were introduced in Julia 0.6, one can also easily weight the influence of each observation without allocating a temporary array.

julia> buffer .= value.(L1DistLoss(), [1.,2,3], [2,5,-2]) .* [2,1,0.5]
3-element Array{Float64,1}:
 2.0
 3.0
 2.5

Even though broadcasting is supported, we do expose a vectorized method natively. This is done mainly for API consistency reasons. Internally it even uses broadcast itself, but it does provide the additional benefit of a more reliable type-inference.

value(loss, targets, outputs) → Array

Computes the value of the loss function for each index-pair in targets and outputs individually and returns the result as an array of the appropriate size.

In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn’t, you likely found a bug.

Parameters:
  • loss (SupervisedLoss) – The loss-function we want to compute the values for.
  • targets (AbstractArray) – The array of ground truths \(\mathbf{y}\).
  • outputs (AbstractArray) – The array of predicted outputs \(\mathbf{\hat{y}}\).
Returns:

The element-wise results of the loss function for all values in targets and outputs.

julia> value(L1DistLoss(), [1,2,3], [2,5,-2])
3-element Array{Int64,1}:
 1
 3
 5

julia> value(L1DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
 1.0
 3.0
 5.0

We also provide a mutating version for the same reasons. It even utilizes broadcast! underneath.

value!(buffer, loss, targets, outputs)

Computes the value of the loss function for each index-pair in targets and outputs individually, and stores them in the preallocated buffer, which has to be of the appropriate size.

In the case that the two parameters, targets and outputs, are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn’t, you likely found a bug.

Parameters:
  • buffer (AbstractArray) – Array to store the computed values in. Old values will be overwritten and lost.
  • loss (SupervisedLoss) – The loss-function we want to compute the values for.
  • targets (AbstractArray) – The array of ground truths \(\mathbf{y}\).
  • outputs (AbstractArray) – The array of predicted outputs \(\mathbf{\hat{y}}\).
Returns:

buffer (for convenience).

julia> buffer = zeros(3); # preallocate a buffer

julia> value!(buffer, L1DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
 1.0
 3.0
 5.0

Computing the 1st Derivatives

Maybe the more interesting aspect of loss functions are their derivatives. In fact, most of the popular learning algorithm in Supervised Learning, such as gradient descent, utilize the derivatives of the loss in one way or the other during the training process.

To compute the derivative of some loss we expose the function deriv(). It supports the same exact method signatures as value(). It may be interesting to note explicitly, that we always compute the derivative in respect to the predicted output, since we are interested in deducing in which direction the output should change.

deriv(loss, target, output) → Number

Computes the derivative for the loss-function denoted by the parameter loss in respect to the output. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn’t, you likely found a bug.

Parameters:
  • loss (SupervisedLoss) – The loss-function \(L\) we want to compute the derivative with.
  • target (Number) – The ground truth \(y \in Y\) of the observation.
  • output (Number) – The predicted output \(\hat{y} \in \mathbb{R}\) for the observation.
Returns:

The derivative of the loss-function for the given parameters.

#               loss        y    ŷ
julia> deriv(L2DistLoss(), 1.0, 2.0)
2.0

julia> deriv(L2DistLoss(), 1, 2)
2

julia> deriv(L2HingeLoss(), -1, 2)
6

julia> deriv(L2HingeLoss(), -1f0, 2f0)
6.0f0

Similar to value(), this function also supports broadcasting and all the syntax benefits that come with it. Thus, one can make use of preallocated memory for storing the element-wise derivatives.

julia> deriv.(L2DistLoss(), [1,2,3], [2,5,-2])
3-element Array{Int64,1}:
   2
   6
 -10

julia> buffer = zeros(3); # preallocate a buffer

julia> buffer .= deriv.(L2DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
   2.0
   6.0
 -10.0

Furthermore, with the loop fusion changes that were introduced in Julia 0.6, one can also easily weight the influence of each observation without allocating a temporary array.

julia> buffer .= deriv.(L2DistLoss(), [1.,2,3], [2,5,-2]) .* [2,1,0.5]
3-element Array{Float64,1}:
  4.0
  6.0
 -5.0

While broadcast is supported, we do expose a vectorized method natively. This is done mainly for API consistency reasons. Internally it even uses broadcast itself, but it does provide the additional benefit of a more reliable type-inference.

deriv(loss, targets, outputs) → Array

Computes the derivative of the loss function in respect to the output for each index-pair in targets and outputs individually and returns the result as an array of the appropriate size.

In the case that the two parameters are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn’t, you likely found a bug.

Parameters:
  • loss (SupervisedLoss) – The loss-function we want to compute the derivative for.
  • targets (AbstractArray) – The array of ground truths \(\mathbf{y}\).
  • outputs (AbstractArray) – The array of predicted outputs \(\mathbf{\hat{y}}\).
Returns:

The element-wise derivatives of the loss function for all elements in targets and outputs.

julia> deriv(L2DistLoss(), [1,2,3], [2,5,-2])
3-element Array{Int64,1}:
   2
   6
 -10

julia> deriv(L2DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
   2.0
   6.0
 -10.0

We also provide a mutating version for the same reasons. It even utilizes broadcast! underneath.

deriv!(buffer, loss, targets, outputs)

Computes the derivatives of the loss function in respect to the outputs for each index-pair in targets and outputs individually, and stores them in the preallocated buffer, which has to be of the appropriate size.

In the case that the two parameters targets and outputs are arrays with a different number of dimensions, broadcast will be performed. Note that the given parameters are expected to have the same size in the dimensions they share.

Note: This function should always be type-stable. If it isn’t, you likely found a bug.

Parameters:
  • buffer (AbstractArray) – Array to store the computed derivatives in. Old values will be overwritten and lost.
  • loss (SupervisedLoss) – The loss-function we want to compute the derivatives for.
  • targets (AbstractArray) – The array of ground truths \(\mathbf{y}\).
  • outputs (AbstractArray) – The array of predicted outputs \(\mathbf{\hat{y}}\).
Returns:

buffer (for convenience).

julia> buffer = zeros(3); # preallocate a buffer

julia> deriv!(buffer, L2DistLoss(), [1.,2,3], [2,5,-2])
3-element Array{Float64,1}:
   2.0
   6.0
 -10.0

It is also possible to compute the value and derivative at the same time. For some losses that means less computation overhead.

value_deriv(loss, target, output) → Tuple

Returns the results of value() and deriv() as a tuple. In some cases this function can yield better performance, because the losses can make use of shared variables when computing the results. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn’t, you likely found a bug.

Parameters:
  • loss (SupervisedLoss) – The loss-function we are working with.
  • target (Number) – The ground truth \(y \in Y\) of the observation.
  • output (Number) – The predicted output \(\hat{y} \in \mathbb{R}\) for the observation.
Returns:

The value and the derivative of the loss-function for the given parameters. They are returned as a Tuple in which the first element is the value and the second element the derivative.

#                     loss         y    ŷ
julia> value_deriv(L2DistLoss(), -1.0, 3.0)
(16.0,8.0)

Computing the 2nd Derivatives

Additionally to the first derivative, we also provide the corresponding methods for the second derivative through the function deriv2(). Note again, that we always compute the derivative in respect to the predicted output.

deriv2(loss, target, output) → Number

Computes the second derivative for the loss-function denoted by the parameter loss in respect to the output. Note that target and output can be of different numeric type, in which case promotion is performed in the manner appropriate for the given loss.

Note: This function should always be type-stable. If it isn’t, you likely found a bug.

Parameters:
  • loss (SupervisedLoss) – The loss-function \(L\) we want to compute the second derivative with.
  • target (Number) – The ground truth \(y \in Y\) of the observation.
  • output (Number) – The predicted output \(\hat{y} \in \mathbb{R}\) for the observation.
Returns:

The second derivative of the loss-function for the given parameters.

#               loss             y    ŷ
julia> deriv2(LogitDistLoss(), -0.5, 0.3)
0.42781939304058886

julia> deriv2(LogitMarginLoss(), -1f0, 2f0)
0.104993574f0

Just like deriv() and value(), this function also supports broadcasting and all the syntax benefits that come with it. Thus, one can make use of preallocated memory for storing the element-wise derivatives.

julia> deriv2.(LogitDistLoss(), [-0.5, 1.2, 3], [0.3, 2.3, -2])
3-element Array{Float64,1}:
 0.427819
 0.37474
 0.0132961

julia> buffer = zeros(3); # preallocate a buffer

julia> buffer .= deriv2.(LogitDistLoss(), [-0.5, 1.2, 3], [0.3, 2.3, -2])
3-element Array{Float64,1}:
 0.427819
 0.37474
 0.0132961

Furthermore deriv2() supports all the same method signatures as deriv() does. So to avoid repeating the same text over and over again, please look at the documentation of deriv() for more information.

Function Closures

In some circumstances it may be convenient to have the loss function or its derivative as a proper Julia function. Instead of exporting special function names for every implemented loss (like l2distloss(...)), we provide the ability to generate a true function on the fly for any given loss.

value_fun(loss) → Function

Returns a new function that computes the value() for the given loss. This new function will support all the signatures that value() does.

Parameters:loss (Loss) – The loss we want the function for.
julia> f = value_fun(L2DistLoss())
(::_value) (generic function with 1 method)

julia> f(-1.0, 3.0) # computes the value of L2DistLoss
16.0

julia> f.([1.,2], [4,7])
2-element Array{Float64,1}:
  9.0
 25.0
deriv_fun(loss) → Function

Returns a new function that computes the deriv() for the given loss. This new function will support all the signatures that deriv() does.

Parameters:loss (Loss) – The loss we want the derivative-function for.
julia> g = deriv_fun(L2DistLoss())
(::_deriv) (generic function with 1 method)

julia> g(-1.0, 3.0) # computes the deriv of L2DistLoss
8.0

julia> g.([1.,2], [4,7])
2-element Array{Float64,1}:
  6.0
 10.0
deriv2_fun(loss) → Function

Returns a new function that computes the deriv2() (i.e. second derivative) for the given loss. This new function will support all the signatures that deriv2() does.

Parameters:loss (Loss) – The loss we want the second-derivative function for.
julia> g2 = deriv2_fun(L2DistLoss())
(::_deriv2) (generic function with 1 method)

julia> g2(-1.0, 3.0) # computes the second derivative of L2DistLoss
2.0

julia> g2.([1.,2], [4,7])
2-element Array{Float64,1}:
 2.0
 2.0
value_deriv_fun(loss) → Function

Returns a new function that computes the value_deriv() for the given loss. This new function will support all the signatures that value_deriv() does.

Parameters:loss (Loss) – The loss we want the function for.
julia> fg = value_deriv_fun(L2DistLoss())
(::_value_deriv) (generic function with 1 method)

julia> fg(-1.0, 3.0) # computes the second derivative of L2DistLoss
(16.0,8.0)

Note, however, that these closures cause quite an overhead when executed in the global scope. If you want to use them efficiently, either don’t create them in global scope, or make sure that you pass the closure to some other function before it is used. This way the compiler will most likely inline it and it will be a zero cost abstraction.

julia> f = value_fun(L2DistLoss())
(::_value) (generic function with 1 method)

julia> @code_llvm f(-1.0, 3.0)
define %jl_value_t* @julia__value_70960(%jl_value_t*, %jl_value_t**, i32) #0 {
top:
  %3 = alloca %jl_value_t**, align 8
  store volatile %jl_value_t** %1, %jl_value_t*** %3, align 8
  %ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-2672, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #2
    [... many more lines of code ...]
  %15 = call %jl_value_t* @jl_f__apply(%jl_value_t* null, %jl_value_t** %5, i32 3)
  %16 = load i64, i64* %11, align 8
  store i64 %16, i64* %9, align 8
  ret %jl_value_t* %15
}

julia> foo(t,y) = (f = value_fun(L2DistLoss()); f(t,y))
foo (generic function with 1 method)

julia> @code_llvm foo(-1.0, 3.0)
define double @julia_foo_71242(double, double) #0 {
top:
  %2 = fsub double %1, %0
  %3 = fmul double %2, %2
  ret double %3
}

Properties of a Loss

In some situations it can be quite useful to assert certain properties about a loss-function. One such scenario could be when implementing an algorithm that requires the loss to be strictly convex or Lipschitz continuous. Note that we will only skim over the defintions in most cases. A good treatment of all of the concepts involved can be found in either [BOYD2004] or [STEINWART2008].

[BOYD2004]Stephen Boyd and Lieven Vandenberghe. “Convex Optimization”. Cambridge University Press, 2004.
[STEINWART2008]Steinwart, Ingo, and Andreas Christmann. “Support vector machines”. Springer Science & Business Media, 2008.

This package uses functions to represent individual properties of a loss. It follows a list of implemented property-functions defined in LearnBase.jl.

isconvex(loss) → Bool

Returns true if given loss is a convex function. A function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\) is convex if its domain is a convex set and if for all \(x, y\) in that domain, with \(\theta\) such that for \(0 \leq \theta \leq 1\) , we have

\[f(\theta x + (1 - \theta) y) \leq \theta f(x) + (1 - \theta) f(y)\]
Parameters:loss (Loss) – The loss we want to check for convexity.
julia> isconvex(LPDistLoss(0.5))
false

julia> isconvex(ZeroOneLoss())
false

julia> isconvex(L1DistLoss())
true

julia> isconvex(L2DistLoss())
true
isstrictlyconvex(loss) → Bool

Returns true if given loss is a strictly convex function. A function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\) is strictly convex if its domain is a convex set and if for all \(x, y\) in that domain where \(x \neq y\), with \(\theta\) such that for \(0 < \theta < 1\) , we have

\[\begin{split}f(\theta x + (1 - \theta) y) < \theta f(x) + (1 - \theta) f(y)\end{split}\]
Parameters:loss (Loss) – The loss we want to check for strict convexity.
julia> isstrictlyconvex(L1DistLoss())
false

julia> isstrictlyconvex(LogitDistLoss())
true

julia> isstrictlyconvex(L2DistLoss())
true
isstronglyconvex(loss) → Bool

Returns true if given loss is a strongly convex function. A function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\) is \(m\)-strongly convex if its domain is a convex set and if \(\forall x,y \in\) dom \(f\) where \(x \neq y\), and \(\theta\) such that for \(0\) \(\le\) \(\theta\) \(\le\) \(1\) , we have

\[\begin{split}f(\theta x + (1 - \theta)y) < \theta f(x) + (1 - \theta) f(y) - 0.5 m \cdot \theta (1 - \theta) {\| x - y \|}_2^2\end{split}\]

In a more familiar setting, if the loss function is differentiable we have

\[\left( \nabla f(x) - \nabla f(y) \right)^\top (x - y) \ge m {\| x - y\|}_2^2\]
Parameters:loss (Loss) – The loss we want to check for strong convexity.
julia> isstronglyconvex(L1DistLoss())
false

julia> isstronglyconvex(LogitDistLoss())
false

julia> isstronglyconvex(L1DistLoss())
true
isdifferentiable(loss[, at]) → Bool

Returns true if given loss is differentiable (optionally only at the given point if at is specified). A function \(f : \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}\) is differentiable at a point \(x \in\) int dom \(f\) if there exists a matrix \(Df(x)\) in \(\mathbb{R}^{m \times n}\) such that it satisfies:

\[\lim_{z \neq x, z \to x} \frac{{\|f(z) - f(x) - Df(x)(z-x)\|}_2}{{\|z - x\|}_2} = 0\]

A function is differentiable if its domain is open and it is differentiable at every point \(x\).

Parameters:
  • loss (Loss) – The loss we want to check for differentiability.
  • at (Number) – Optional. The point x for which it should be checked if the function is differentiable at.
julia> isdifferentiable(L1DistLoss())
false

julia> isdifferentiable(L1DistLoss(), 1)
true

julia> isdifferentiable(L2DistLoss())
true
istwicedifferentiable(loss[, at]) → Bool

Returns true if given loss is a twice differentiable function (optionally only at the given point if at is specified). A function \(f : \mathbb{R}^{n} \rightarrow \mathbb{R}\) is said to be twice differentiable at a point \(x \in\) int dom \(f\) if the function derivative for \(\nabla f\) exists at \(x\).

\[\nabla^2 f(x) = D \nabla f(x)\]

A function is twice differentiable if its domain is open and it is twice differentiable at every point \(x\).

Parameters:
  • loss (Loss) – The loss we want to check for differentiability.
  • at (Number) – Optional. The point x for which it should be checked if the function is twice differentiable at.
julia> istwicedifferentiable(L1DistLoss())
false

julia> istwicedifferentiable(L1DistLoss())
true
isnemitski(loss) → Bool

Returns true if given loss is a Nemitski loss function.

We call a supervised loss function \(L : Y \times \mathbb{R} \rightarrow [0,\infty)\) a Nemitski loss if there exist a measurable function \(b : Y \rightarrow [0, \infty)\) and an increasing function \(h : [0, \infty) \rightarrow [0, \infty)\) such that

\[L(y,\hat{y}) \le b(y) + h(|\hat{y}|), \qquad (y, \hat{y}) \in Y \times \mathbb{R}.\]
islipschitzcont(loss) → Bool

Returns true if given loss function is Lipschitz continuous.

A supervised loss function \(L : Y \times \mathbb{R} \rightarrow [0, \infty)\) is Lipschitz continous if there exists a finite constant \(M < \infty\) such that

\[|L(y, t) - L(y, t′)| \le M |t - t′|, \qquad \forall (y, t) \in Y \times \mathbb{R}\]
Parameters:loss (Loss) – The loss we want to check for being Lipschitz continuous.
julia> islipschitzcont(SigmoidLoss())
true

julia> islipschitzcont(ExpLoss())
false
islocallylipschitzcont(loss) → Bool

Returns true if given loss function is locally-Lipschitz continous.

A supervised loss \(L : Y \times \mathbb{R} \rightarrow [0, \infty)\) is called locally Lipschitz continuous if \(\forall a \ge 0\) there exists a constant \(c_a \ge 0\) such that

\[\sup_{y \in Y} \left| L(y,t) − L(y,t′) \right| \le c_a |t − t′|, \qquad t,t′ \in [−a,a]\]
Parameters:loss (Loss) – The loss we want to check for being locally Lipschitz-continous.
julia> islocallylipschitzcont(ExpLoss())
true

julia> islocallylipschitzcont(SigmoidLoss())
true
isclipable(loss) → Bool

Returns true if given loss function is clipable. A supervised loss \(L : Y \times \mathbb{R} \rightarrow [0, \infty)\) can be clipped at \(M > 0\) if, for all \((y,t) \in Y \times \mathbb{R}\),

\[L(y, \hat{t}) \le L(y, t)\]

where \(\hat{t}\) denotes the clipped value of \(t\) at \(\pm M\). That is

\[\begin{split}\hat{t} = \begin{cases} -M & \quad \text{if } t < -M \\ t & \quad \text{if } t \in [-M, M] \\ M & \quad \text{if } t > M \end{cases}\end{split}\]
Parameters:loss (Loss) – The loss we want to check for being clipable.
julia> isclipable(ExpLoss())
false

julia> isclipable(L2DistLoss())
true
ismarginbased(loss) → Bool

Returns true if given loss is a margin-based Loss.

A supervised loss function \(L : Y \times \mathbb{R} \rightarrow [0, \infty)\) is said to be margin-based if there exists a representing function \(\psi : \mathbb{R} \rightarrow [0, \infty)\) satisfying

\[L(y, \hat{y}) = \psi (y \cdot \hat{y}), \qquad (y, \hat{y}) \in Y \times \mathbb{R}\]
Parameters:loss (Loss) – The loss we want to check for being margin-based.
julia> ismarginbased(HuberLoss(2))
false

julia> ismarginbased(L2MarginLoss())
true
isclasscalibrated(loss) → Bool
isdistancebased(loss) → Bool

Returns true if given loss is a distance-based Loss.

A supervised loss function \(L : Y \times \mathbb{R} \rightarrow [0, \infty)\) is said to be distance-based if there exists a representing function \(\psi : \mathbb{R} \rightarrow [0, \infty)\) satisfying \(\psi (0) = 0\) and

\[L(y, \hat{y}) = \psi (\hat{y} - y), \qquad (y, \hat{y}) \in Y \times \mathbb{R}\]
Parameters:loss (Loss) – The loss we want to check for being distance-based.
julia> isdistancebased(HuberLoss(2))
true

julia> isdistancebased(L2MarginLoss())
false
issymmetric(loss) → Bool

Returns true if given loss is a Symmetric Loss.

A function \(f : \mathbb{R} \rightarrow [0,\infty)\) is said to be symmetric about origin if we have

\[f(x) = f(-x), \qquad \forall x \in \mathbb{R}\]

A distance-based loss is said to be symmetric if its representing function is symmetric.

Parameters:loss (Loss) – The loss we want to check for being symmetric.
julia> issymmetric(QuantileLoss(0.2))
false

julia> issymetric(LPDistLoss(2))
true