Generalized Matthew Package
What Can this Package do?
Written with Uri Itai
What is Matthew’s Correlation Coefficient?
Matthew Correlation Coefficient (MCC) is a goodness of fit that aims to provide better results than common KPIs such as F1 or AUC. In particular, MCC handles better scenarios of imbalanced data (it is more agonistical to prevalence). Mathematically speaking, if we consider a binary classification problem, it is a regular Pearson coefficient between the two random variables: The target (Y) and the predicted (Y’). In the binary case, the Pearson correlation collapses to a simple formula. You can find it here and here.
Generalization of MCC to Multi-class Problems
Isn’t it Straightforward?
Pearson correlation does not exist in the non-binary categorical world. Since the topology is discrete and the set of values is non-orderable. Thus generalizing MCC for multi-class problems requires some modifications in our definitions. In a paper that we published. We approached this problem through the question of using different types of means.
The python package that we present here is an implementation of these ideas.
We will briefly cover the prominent type of means, where we address only positive numbers.
Arithmetic Mean
The most commonly used type of mean where the weights of both numbers are equal
Weighted Mean
Similar to the Arithmetic mean, but with different weights
Geometric Mean
Here we calculate the square root of a product of two positive numbers. The readers that know logarithmic scores may find it familiar.
Harmonic Mean
The reciprocal of the sum of two reciprocals
An ML example of the harmonic mean is F1-score, Which is the harmonic mean between precision and recall.
HM-GM-AM Inequalities
A common inequality states that for two positive numbers X, and Y, Their Harmonic mean is smaller than their Geometric mean, which is smaller than their arithmetic mean.
In the paper, we used these means to extend F1 and Matthew to multi-class problems.
The python Package
Mathematics is nice and proving lemmas about means is awesome. However, as every data scientist knows, a data science project exists only if you can speak about it in a legal language such as Python, Python or in extreme cases, Python. Thus we organized our outcome in a new Python package: generalized_matth.
As we focused on two objectives:
- Generalizing Matthew to multi-class problems
- Creating a tool to compare it with our generalized F1
We needed a set of functions that allow us to both calculate the generalized score as well as compare the goodness of fit’s quality:
Calling the Base object
We begin with calling to the main object that conducts the function calculations:
import generalized_matth
from generalized_matth.matt_funct import matthew_multiclass
matthew_mlticlass is the class that we use to run our functions. We describe a new enum:
class AVERAG_TYPE(Enum):
MATTHEW_GEN = 2
F1_GEN = 3
This enum contains the type of means to be used. In the current version, there are only generalized F1 and generalized Matthew (F1_GEN and MATTHEW_GEN). In some more advanced code versions, we extend it to additional means.
matthew_multiclass calling
We presented the enum and the class’ name, now we can show how to call it.
It receives two variables:
- y_true — A list of target values
- y_pred — A list of predicted values
These arrays have the same length.
y_true = np.asarray([1,0,0,1,1])
y_pred = np.asarray([1,1,1,0,1])
test_class =matthew_multiclass(y_true,y_pred )
For such a call, test_class will calculate the generalized Matthew score. It is equivalent to adding this line :
from generalized_matth.matt_funct import AVERAG_TYPE
and call
y_true = np.asarray([1,0,0,1,1])
y_pred = np.asarray([1,1,1,0,1])
test_class =matthew_multiclass(y_true, y_pred, avg_type=AVERAG_TYPE.MATTHEW_GEN.value )
The score is about -0.4082.
If we wish to use the generalized F1 we perform the following:
y_true = np.asarray([1,1,1,0,0])
y_pred = np.asarray([1,1,1,0,1])
test_class =matthew_multiclass(y_true, y_pred, avg_type=AVERAG_TYPE.F1_GEN.value )
The score to be obtained is 0.75
main_matthew_mult_class
This is the main process that runs in the package. The source is the following:
def main_matthew_mult_class(self):
G_mat = self.norm_confusion_mat()
return self.scalar_op(G_mat)
Two generic functions are used. The latter is determined by the average type that we wish to use:
Generalized F1
def gen_f1_scalar_op(self, h_conf_mat):
l_mat = len(h_conf_mat)
return st.mstats.hmean([h_conf_mat[i][i] for i in range(l_mat)])
Generalized Matthew
self.scalar_op = np.linalg.det
Some Tests
In this section we present some tests that allow the reader to verify their code:
y_true = [0] * 13 + [1] * 21 + [2] * 20
y_pred = [0] * 5 + [1] * 6 + [2] * 2 + [0] * 2 + [1] * 8 + [2] * 11 + [0] * 8 + [1] * 2 + [2] * 10test0 =matthew_multiclass(y_true, y_pred, avg_type=AVERAG_TYPE.F1_GEN.value)
print (test0.main_matthew_mult_class())
The score is 0.4130
If we do the following:
test0 =matthew_multiclass(y_true, y_pred, avg_type=AVERAG_TYPE.MATTHEW_GEN.value)
print (test0.main_matthew_mult_class())
The score is about 0.031
Clearly, if we run:
test0 =matthew_multiclass(y_true, y_true, avg_type=AVERAG_TYPE.MATTHEW_GEN.value)
print (test0.main_matthew_mult_class())
The score is 1.0 for both functions.
Acknowledgments
We wish to thank Aleksander Molak, for fruitful discussions and beneficial ideas, during the entire work.