Performance evaluation

Frecuently used techniques for performance evaluation

Metrics

Gini

  • Definition: Indicates how discriminative is the model (predictive power)

  • Possible values:

    • 0 would indicate no discrimination based features to make a choice.

    • 1 would indicate the model is completely relying on features as discriminators to make a choice (desirable for banks for example)

KS statistic

  • Definition: Indicates the maximum distance between distribution functions (classes samples in supervised learning) of two samples (empirical, or one reference).

  • Possible values:

    • 0 would indicate no distinction on the two samples (no difference in label A-label B scores).

    • 1 would indicate Maximum distance between the two samples.

Student's t-test

  • Definition: Indicates how likely a set of samples came from the exact same distribution (P-value). P-Value can be compared with a threshold call statistical significance (e.g. .05).

  • Possible values:

    • P-Value < 0.05, indicate we can reject the null hypothesis that the two samples are coming from the exact same distribution.

  • Comment: Samples must be shaped in a normal distribution.

K fold cross validation:

  • Definition: Indicates the variation of performance by sampling different same sized subsets of the data of K fold and training with the remaining of the subsets.

  • Possible values:

    • K between 5 and 10 folds.

  • Comment: An stratified version of it is recommended given that it tries to hold the same proportions of the whole dataset.

Cross entropy vs sparse cross entropy If your Y_i's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]

But if your Y_i's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]. Might be more memory efficient

References:

Last updated