Performance evaluation
Frecuently used techniques for performance evaluation
Metrics
Gini
Definition: Indicates how discriminative is the model (predictive power)
Possible values:
0 would indicate no discrimination based features to make a choice.
1 would indicate the model is completely relying on features as discriminators to make a choice (desirable for banks for example)
KS statistic
Definition: Indicates the maximum distance between distribution functions (classes samples in supervised learning) of two samples (empirical, or one reference).
Possible values:
0 would indicate no distinction on the two samples (no difference in label A-label B scores).
1 would indicate Maximum distance between the two samples.
Student's t-test
Definition: Indicates how likely a set of samples came from the exact same distribution (P-value). P-Value can be compared with a threshold call statistical significance (e.g. .05).
Possible values:
P-Value < 0.05, indicate we can reject the null hypothesis that the two samples are coming from the exact same distribution.
Comment: Samples must be shaped in a normal distribution.
K fold cross validation:
Definition: Indicates the variation of performance by sampling different same sized subsets of the data of K fold and training with the remaining of the subsets.
Possible values:
K between 5 and 10 folds.
Comment: An stratified version of it is recommended given that it tries to hold the same proportions of the whole dataset.
Cross entropy vs sparse cross entropy If your Y_i
's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]
But if your Y_i
's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]. Might be more memory efficient
References:
Last updated