In this post we will see that Log loss and AUC for binary classification problems are closely related.
Suppose we have a set of probabilities that are meant to predict whether a true label is 0 or 1. That is, let be the predicted and true labels for the
-th example,
.
Recall the AUC is the area under the ROC curve. We will use another interpretation, which is derived on the above wiki: AUC is the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative example.
Note the larger that AUC is, the better the predictions are. For comparison of AUC to log loss, we will convert AUC to a loss function. We do so by studying .
Let be the number of positive examples and
be the number of negative samples. Thus,
We introduce another quantity:
This quantity has a similar shape to the first quantity, except that is replaced by
. Note that these two functions are quite similar. The latter is a smooth variant of the first, that penalizes much harsher for misclassifications that are poor. Put another way, the latter quantity penalizes harshly for confidently wrong predictions.
The above equation factors and is equal to
Let us compare this to the log loss:
The intermediate expression and LogLoss are connected by the AM-GM inequality. For simplicity, let us suppose . Then, the AM-GM inequality implies
As is similar to
, we have that
is roughly a lower bound on the log loss. Let us make some assumptions
- Balanced data:
.
- The predictions are not very wrong too often, i.e.
is not too often close to zero.
- Predictions have similar errors for positive and negative, that is the log loss over positive and negative examples is similar.
Then following through the above argument, we find that
Thus, we understand a couple things:
- We expect that
.
- If this is not the case, then we can infer that the assumptions above are not met in our particular use-case.