Metrics

pycalib.metrics.ECE(y_true, probs, normalize=False, bins=15, ece_full=True)

Calculate ECE score based on model output probabilities and true labels

Parameters:
y_truelist

a list containing the actual class labels ndarray shape (n_samples) with a list containing actual class

labels

ndarray shape (n_samples, n_classes) with largest value in

each row for the correct column class.

probslist

a list containing probabilities for all the classes with a shape of (samples, classes)

normalize: (bool)

in case of 1-vs-K calibration, the probabilities need to be normalized. (default = False)

bins: (int)

into how many bins are probabilities divided (default = 15)

ece_full: (bool)

whether to use ECE-full or ECE-max.

Returns:
ecefloat

expected calibration error

pycalib.metrics.MCE(y_true, probs, normalize=False, bins=15, mce_full=False)

Calculate MCE score based on model output probabilities and true labels

Parameters:
y_truelist

containing the actual class labels

probslist

containing probabilities for all the classes with a shape of (samples, classes)

normalizebool

in case of 1-vs-K calibration, the probabilities need to be normalized. (default = False)

binsint

into how many bins are probabilities divided (default = 15)

mce_fullboolean

whether to use ECE-full or ECE-max for calculation MCE.

Returns:
mcefloat

maximum calibration error

pycalib.metrics.accuracy(y_true, y_pred)

Classification accuracy score

Accuracy for binary and multiclass classification problems. Consists on the proportion of correct estimations assuming the maximum class probability of each score as the estimated class.

Parameters:
y_truelabel indicator matrix (n_samples, n_classes)

True labels. # TODO Add option to pass array with shape (n_samples, )

y_predmatrix (n_samples, n_classes)

Predicted scores.

Returns:
scorefloat

Proportion of correct predictions as a value between 0 and 1.

Examples

>>> from pycalib.metrics import accuracy
>>> Y = np.array([[0, 1], [0, 1]])
>>> S = np.array([[0.1, 0.9], [0.6, 0.4]])
>>> accuracy(Y, S)
0.5
>>> Y = np.array([[0, 1], [0, 1]])
>>> S = np.array([[0.1, 0.9], [0, 1]])
>>> accuracy(Y, S)
1.0
pycalib.metrics.binary_ECE(y_true, probs, power=1, bins=15)

Binary Expected Calibration Error

\[\text{binary-ECE} = \sum_{i=1}^M \frac{|B_{i}|}{N} | \bar{y}(B_{i}) - \bar{p}(B_{i})|\]
Parameters:
y_trueindicator vector (n_samples, )

True labels.

probsmatrix (n_samples, )

Predicted probabilities for positive class.

Returns:
scorefloat

Examples

>>> from pycalib.metrics import binary_ECE
>>> Y = np.array([0, 1])
>>> P = np.array([0.1, 0.9])
>>> print(round(binary_ECE(Y, P, bins=2), 8))
0.1
>>> Y = np.array([0, 0, 0, 1, 1, 1])
>>> P = np.array([.1, .2, .3, .7, .8, .9])
>>> print(round(binary_ECE(Y, P, bins=2), 8))
0.2
>>> Y = np.array([0, 0, 0, 1, 1, 1])
>>> P = np.array([.4, .4, .4, .6, .6, .6])
>>> print(round(binary_ECE(Y, P, bins=2), 8))
0.4
pycalib.metrics.binary_MCE(y_true, probs, power=1, bins=15)

Binary Maximum Calibration Error

\[\text{binary-MCE} = \max_{i \in \{1, ..., M\}} |\bar{y}(B_{i}) - \bar{p}(B_{i})|\]
Parameters:
y_trueindicator vector (n_samples, )

True labels.

probsmatrix (n_samples, )

Predicted probabilities for positive class.

Returns:
scorefloat

Examples

>>> from pycalib.metrics import binary_MCE
>>> Y = np.array([0, 1])
>>> P = np.array([0.1, 0.6])
>>> print(round(binary_MCE(Y, P, bins=2), 8))
0.4
>>> Y = np.array([0, 0, 0, 1, 1, 1])
>>> P = np.array([.1, .2, .3, .6, .7, .8])
>>> print(round(binary_MCE(Y, P, bins=2), 8))
0.3
>>> Y = np.array([0, 0, 0, 1, 1, 1])
>>> P = np.array([.1, .2, .3, .3, .2, .1])
>>> print(round(binary_MCE(Y, P, bins=1), 8))
0.3
>>> Y = np.array([0, 0, 0, 1, 1, 1])
>>> P = np.array([.1, .2, .3, .9, .9, .9])
>>> print(round(binary_MCE(Y, P, bins=2), 8))
0.2
>>> Y = np.array([0, 0, 0, 1, 1, 1])
>>> P = np.array([.1, .1, .1, .6, .6, .6])
>>> print(round(binary_MCE(Y, P, bins=2), 8))
0.4
pycalib.metrics.brier_score(y_true, y_pred)

Brier score

Computes the Brier score between the true labels and the estimated probabilities. This corresponds to the Mean Squared Error between the estimations and the true labels.

Parameters:
y_truelabel indicator matrix (n_samples, n_classes)

True labels. # TODO Add option to pass array with shape (n_samples, )

y_predmatrix (n_samples, n_classes)

Predicted scores.

Returns:
scorefloat

Positive value between 0 and 1.

Examples

>>> from pycalib.metrics import cross_entropy
>>> Y = np.array([[0, 1], [0, 1]])
>>> S = np.array([[0.1, 0.9], [0.6, 0.4]])
>>> brier_score(Y, S)
0.185
pycalib.metrics.classwise_ECE(y_true, probs, power=1, bins=15)

Classwise Expected Calibration Error

\[ \begin{align}\begin{aligned}\text{class-$j$-ECE} = \sum_{i=1}^M \frac{|B_{i,j}|}{N} |\bar{y}_j(B_{i,j}) - \bar{p}_j(B_{i,j})|,\\\text{classwise-ECE} = \frac{1}{K}\sum_{j=1}^K \text{class-$j$-ECE}\end{aligned}\end{align} \]
Parameters:
y_truelabel indicator matrix (n_samples, n_classes)

True labels. # TODO Add option to pass array with shape (n_samples, )

probsmatrix (n_samples, n_classes)

Predicted probabilities.

Returns:
scorefloat

Examples

>>> from pycalib.metrics import classwise_ECE
>>> Y = np.array([[1, 0], [0, 1]]).T
>>> P = np.array([[0.9, 0.1], [0.1, 0.9]]).T
>>> print(round(classwise_ECE(Y, P, bins=2), 8))
0.1
>>> Y = np.array([[1, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 1]]).T
>>> P = np.array([[.9, .8, .7, .3, .2, .1], [.1, .2, .3, .7, .8, .9]]).T
>>> print(round(classwise_ECE(Y, P, bins=2), 8))
0.2
pycalib.metrics.classwise_MCE(y_true, probs, bins=15)

Classwise Maximum Calibration Error

\[ \begin{align}\begin{aligned}\text{class-$j$-MCE} = \max_{i \in {1, ..., M}} |\bar{y}_j(B_{i,j}) - \bar{p}_j(B_{i,j})|,\\\text{classwise-MCE} = \max_{j \in {1, ..., K}} \text{class-$j$-MCE}\end{aligned}\end{align} \]
Parameters:
y_truelabel indicator matrix (n_samples, n_classes)

True labels. # TODO Add option to pass array with shape (n_samples, )

probsmatrix (n_samples, n_classes)

Predicted probabilities.

Returns:
scorefloat

Examples

>>> from pycalib.metrics import classwise_MCE
>>> Y = np.array([[1, 0], [0, 1]]).T
>>> P = np.array([[0.8, 0.1], [0.2, 0.9]]).T
>>> print(round(classwise_MCE(Y, P, bins=2), 8))
0.2
>>> Y = np.array([[1, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 1]]).T
>>> P = np.array([[.8, .7, .6, .1, .1, .1], [.2, .3, .4, .9, .9, .9]]).T
>>> print(round(classwise_MCE(Y, P, bins=2), 8))
0.3
pycalib.metrics.conf_ECE(y_true, probs, bins=15)

Confidence Expected Calibration Error

Calculate ECE score based on model maximum output probabilities and true labels

\[\text{confidence-ECE} = \sum_{i=1}^M \frac{|B_{i}|}{N} | \text{accuracy}(B_{i}) - \bar{p}(B_{i})|\]

In which $p$ are the maximum predicted probabilities.

Parameters:
y_true:
  • a list containing the actual class labels

  • ndarray shape (n_samples) with a list containing actual class labels

  • ndarray shape (n_samples, n_classes) with largest value in each row for the correct column class.

probs:

a list containing probabilities for all the classes with a shape of (samples, classes)

bins: (int)
  • into how many bins are probabilities divided (default = 15)

Returns:
ecefloat

expected calibration error

Examples

>>> from pycalib.metrics import conf_ECE
>>> Y = np.array([[1, 0], [0, 1]]).T
>>> P = np.array([[0.9, 0.1], [0.1, 0.9]]).T
>>> print(round(conf_ECE(Y, P, bins=2), 8))
0.1
>>> Y = np.array([[1, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 1]]).T
>>> P = np.array([[.9, .8, .7, .3, .2, .1], [.1, .2, .3, .7, .8, .9]]).T
>>> print(round(conf_ECE(Y, P, bins=2), 8))
0.2
pycalib.metrics.conf_MCE(y_true, probs, bins=15)

Calculate ECE score based on model output probabilities and true labels

Parameters:
y_true:
  • a list containing the actual class labels

  • ndarray shape (n_samples) with a list containing actual class labels

  • ndarray shape (n_samples, n_classes) with largest value in each row for the correct column class.

probs:

a list containing probabilities for all the classes with a shape of (samples, classes)

bins: (int)
  • into how many bins are probabilities divided (default = 15)

Returns:
mcefloat

maximum calibration error

pycalib.metrics.cross_entropy(y_true, y_pred)

Cross-entropy score

Computes the cross-entropy (a.k.a. log-loss) for binary and multiclass classification scores.

Parameters:
y_truelabel indicator matrix (n_samples, n_classes)

True labels. # TODO Add option to pass array with shape (n_samples, )

y_predmatrix (n_samples, n_classes)

Predicted scores.

Returns:
scorefloat

Examples

>>> from pycalib.metrics import cross_entropy
>>> Y = np.array([[0, 1], [0, 1]])
>>> S = np.array([[0.1, 0.9], [0.6, 0.4]])
>>> cross_entropy(Y, S)
0.5108256237659906