What is F1 Score :“The Complete Understanding ”
Lot of time when we required a Precision as well as Recall in our model performance at that time without knowing much about the F1 score, we end up using F1 score. Lets dive in to the understanding of the F1 score. In order to understand the concept of F1 score, It is upmost important to know the Harmonic Mean.[1].I presume here you know Precision and Recall.
Why Harmonic Mean ?
Let me tell you a story first, There was a Manager whose name was Average, and she had two employees in her team Precision and Recall. Precision was hard working person and Recall was lazy. Precision used to get above 95% performance and Recall below 35%. At the end of every month when she calculates average performance of the project. It was 65%.
She observed that the 65% overall project performance was not doing any good to hard working employee ‘Precision'. On other, hand ‘Recall’ was getting benefited for the work he didn’t do. So, F1 thought she should change the method of evaluation
After giving it though for while, A senior of her, called ‘Statistics ’ suggested that she should use a better evaluation matrix, instead of just taking simple average. The new evaluation matrix, called F1, also called as Harmonic Mean, gave her following result
Now, newly calculated project performance was 0.51 which was low compared to previous one , This indicated that both precision and recall had to work hard in order to improve overall performance. This puts more pressure on recalls . Since precision was already working very well.
In machine Learning when we require both the Precision of Recall high, we tend to use F1 score(Harmonic Mean) instead of just taking average of both. We cannot make Harmonic mean large by just making one large value more large. We have to increase both smaller and larger.
We can not increase harmonic mean by just making one large value(Precision in this case) larger. We have to increase both smaller and larger values.
Improvement in F1
In continuation of our story, after a few days Manager realized recall is pretty stubborn and is not improving at all. So Manager said to recall that I would put a penalty in your performance matrix which is β.
β is a positive real factor such that improvement in Recall is β times more important than Precision[2]. For example β = 2 when importance of improvement in Recall is 2 times important than compared to Precision
Lets calculate a new evaluation matrix.
Now overall project performance is much lesser than simple average as well as previous F1. So, now Recall has to improve himself……
Finally Recall worked hard and improved now it has performance 85%
Depending upon our Machine Learning problem we can tweak the parameter β.
Let us consider some cases:
Case1, β = 1 When there is same weightage for Precision and Recall, e.g. Classification problem like given image classify Cat, Not Cat
Case2, β >1 When there is more weightage for Recall e.g. Classify the Tumor malignant or benign. (where we require minimal False Negative, that is if the patient actually has cancerous Tumor and model classify that as Benign). This is very critical in some Machine Learning Problem.
Case3, β <1 When there is more weightage for Precision, e.g. Classification of email spam or not spam. Where we need minimal False Positive points. (FP= Model predicts the email as spam, whereas that mail is actually not spam.) In such a scenario, important mail might get classified as spam and user might overlook it.
Thanks for Reading, Hope you find this helpful, feel free to give feedback