Generalized Matthew Package

What Can this Package do?

Natan Katz

Published in

Nerd For Tech

4 min readSep 4, 2022

Written with Uri Itai

Image from https://arxiv.org/pdf/2208.05651.pdf

What is Matthew’s Correlation Coefficient?

Matthew Correlation Coefficient (MCC) is a goodness of fit that aims to provide better results than common KPIs such as F1 or AUC. In particular, MCC handles better scenarios of imbalanced data (it is more agonistical to prevalence). Mathematically speaking, if we consider a binary classification problem, it is a regular Pearson coefficient between the two random variables: The target (Y) and the predicted (Y’). In the binary case, the Pearson correlation collapses to a simple formula. You can find it here and here.

Generalization of MCC to Multi-class Problems

Isn’t it Straightforward?

Pearson correlation does not exist in the non-binary categorical world. Since the topology is discrete and the set of values is non-orderable. Thus generalizing MCC for multi-class problems requires some modifications in our definitions. In a paper that we published. We approached this problem through the question of using different types of means.

The python package that we present here is an implementation of these ideas.

We will briefly cover the prominent type of means, where we address only positive numbers.

Arithmetic Mean

The most commonly used type of mean where the weights of both numbers are equal

Weighted Mean

Similar to the Arithmetic mean, but with different weights

Geometric Mean

Here we calculate the square root of a product of two positive numbers. The readers that know logarithmic scores may find it familiar.

Harmonic Mean

The reciprocal of the sum of two reciprocals

An ML example of the harmonic mean is F1-score, Which is the harmonic mean between precision and recall.

HM-GM-AM Inequalities

A common inequality states that for two positive numbers X, and Y, Their Harmonic mean is smaller than their Geometric mean, which is smaller than their arithmetic mean.

In the paper, we used these means to extend F1 and Matthew to multi-class problems.

The python Package

Mathematics is nice and proving lemmas about means is awesome. However, as every data scientist knows, a data science project exists only if you can speak about it in a legal language such as Python, Python or in extreme cases, Python. Thus we organized our outcome in a new Python package: generalized_matth.

As we focused on two objectives:

Generalizing Matthew to multi-class problems
Creating a tool to compare it with our generalized F1

We needed a set of functions that allow us to both calculate the generalized score as well as compare the goodness of fit’s quality:

Calling the Base object

We begin with calling to the main object that conducts the function calculations:

import generalized_matth
from generalized_matth.matt_funct import matthew_multiclass

matthew_mlticlass is the class that we use to run our functions. We describe a new enum:

class AVERAG_TYPE(Enum):
    MATTHEW_GEN = 2
    F1_GEN = 3

This enum contains the type of means to be used. In the current version, there are only generalized F1 and generalized Matthew (F1_GEN and MATTHEW_GEN). In some more advanced code versions, we extend it to additional means.

matthew_multiclass calling

We presented the enum and the class’ name, now we can show how to call it.

It receives two variables:

y_true — A list of target values
y_pred — A list of predicted values

These arrays have the same length.

y_true = np.asarray([1,0,0,1,1])
y_pred = np.asarray([1,1,1,0,1])
test_class =matthew_multiclass(y_true,y_pred )

For such a call, test_class will calculate the generalized Matthew score. It is equivalent to adding this line :

from generalized_matth.matt_funct import AVERAG_TYPE

and call

y_true = np.asarray([1,0,0,1,1])
y_pred = np.asarray([1,1,1,0,1])
test_class =matthew_multiclass(y_true, y_pred, avg_type=AVERAG_TYPE.MATTHEW_GEN.value )

The score is about -0.4082.

If we wish to use the generalized F1 we perform the following:


y_true = np.asarray([1,1,1,0,0])
y_pred = np.asarray([1,1,1,0,1])
test_class =matthew_multiclass(y_true, y_pred, avg_type=AVERAG_TYPE.F1_GEN.value )

The score to be obtained is 0.75

main_matthew_mult_class

This is the main process that runs in the package. The source is the following:

def main_matthew_mult_class(self):
    G_mat = self.norm_confusion_mat()
    return self.scalar_op(G_mat)

Two generic functions are used. The latter is determined by the average type that we wish to use:

Generalized F1

def gen_f1_scalar_op(self, h_conf_mat):
        l_mat = len(h_conf_mat)
        return st.mstats.hmean([h_conf_mat[i][i] for i in range(l_mat)])