By: Carlos Del Carpio, Director of Risk and Analytics, LenddoEFL
This is part 2 of a series of blog posts about Ginis in Credit Scoring. To see the part 1, follow this link.
What is an AUC?
AUC stands for “Area Under the (ROC) Curve”. From a statistical perspective, it measures the probability that a good client chosen randomly has a score higher than a bad client chosen randomly. In that sense, AUC is a statistical measure widely used in many industries and fields across academia to compare the predictive power of two or more different statistical classification models over the exact same data sample [1].
How is AUC used in Credit Scoring?
In the particular case of Credit Scoring, AUCs are useful for example in the model development process, when there are several candidate models built over the same training data and they need to be compared. Another typical use is at the time of introducing a new credit score, to compare a challenger against an incumbent score over the same sample of data under a champion challenger framework.
How does AUC relate to Gini Coefficient?
The Gini Coefficient is a direct conversion from AUC through a simple formula: Gini = (AUC x 2) -1. They measure exactly the same. And it is possible to go directly from one measure to the other, back and forth. The only reason to use Gini over AUC is the improvement in the scale’s interpretability: while the scale of a good predicting model AUC goes from 0.5 to 1, the scale in the case of Gini goes from 0 to 1. However, all the properties and restrictions of AUC still translate into Gini Coefficient, and this includes the need to compare two different AUC values over the exact same data sample to make any conclusion about their relative predictive power.
What does this mean in practical terms?
Any direct comparison of the Gini Coefficients (or AUCs) of two different models over two different data samples will be misleading. For example: If a Bank A has a Credit Score with a Gini Coefficient of 30%, and Bank B has a Credit Score with a Gini Coefficient of 28%, it is not possible to make any conclusion about which is better or which is more predictive because they have been calculated over different data samples without accounting for the difference in absolute number of observations and the difference in proportion of good cases against bad cases. The only direct comparison possible is the one made about two scores side by side, over the exact same data sample.
Bottom-line: To affirm that a certain absolute level of AUC or Gini Coefficient is “good” or “bad” is meaningless. Such affirmation is only possible in relative terms, when comparing two or more different scores over the exact same data sample. Unfortunately this is often not well understood, which leads to the most frequent misuse of AUC and Gini Coefficients, such as direct, un-weighted comparisons of Gini values across different samples, different time periods, different products, different segments and even different financial institutions.
[1] Hanley JA, McNeil BJ. The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Radiology, 1982, 143, 29-36.