Precision, Recall and F1 Score for Beginners

Precision, Recall and F1 Score for Beginners

Terminologies of deep learning, need to remember and understand these to develop better understanding of how classification model is performing.

binary classification considered in this article, same concepts apply for multi class but require a bit more consideration.

Before we begin consider we have our dataset in which we have Labeled Training Data. In this training data, usually called, train set we have labelled some data as positive and some as negative.

True/False Positives and Negatives

In binary classification the model will predict an output and it can be either correct (Positive) or incorrect (negative) prediction.

  • Positive: An instance (single output of model prediction) is considered positive if it is as expected. For example if we are developing a model for predicting cancer. A result of classification will be positive if the input label was positive and the prediction was also positive. Meaning, the model predicted the correct label.
  • Negative: A prediction is negative if it is classified as not being a member of the class we are trying to identify. For example if we are trying to identify positive case for cancer. Or, if we are looking for classifying cats but classifier classifies it as dog.

Now is the time to talk about True Positive, True Negative and False Positive, False Negative.

  • True Positive: If the training data value is cancer positive and model predicted positive
  • True Negative: If the data value was cancer negative and model predicted negative, training data label = 0 and model output = 0.
  • False Positive: The model incorrectly classified a patient as cancerous. Training data label = 0, model output = 1.
  • False Negative: The model incorrectly classified patient as non-cancer. Here we can have an idea of how important this is, if we wrongly classify a patient as non-cancerous it is life threatening and may have serious consequences. This means training data label = 1, and model prediction = 0.

Confusion Matrix

Now that we have fair bit of understanding of True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) we can look at Confusion matrix. It is an illustration that can help us map the performance of our model using these four values. The confusion matrix helps us find the accuracy of our model in classifying the data into corresponding labels.

Confusion matrix

The Cancer example

Let’s look at the confusion matrix for the cancer predictions we have been using as example so far. Suppose we had one hundred patients’ data in our training data. Now assume our model gave us these predictions:

  • TP: 45 positive cases correctly predicted
  • TN: 25 negative cases correctly predicted
  • FP: 18 negative cases are misclassified (wrong positive predictions)
  • FN: 12 positive cases are misclassified (wrong negative predictions)
Confusion matrix for 100 cancer patients example

This let’s us have an overview of the model performance and make further decisions. We can see that our model has made some mistakes and some are more severe than others. The False Negatives in the bottom left corner, are more critical as they classify a patient with cancer (positive) as non-cancerous(negative) therefore denying them treatment.

What we desire is that our model predict Cancer patient and non Cancers correctly.

Accuracy:

It is the number of correct predictions out of total predictions of our model

formula of accuracy in Machine Learning

for our example of cancer patients, the accuracy will be:

Accuracy = (45+25) / 45+25+18+12 = 0.7

Which is equal to 70%.

Why accuracy is not enough?

If our dataset has 90% non-cancer patients’ data and only 10% patients with cancer, this is an imbalanced dataset. If we train the model on this, it will be more inclined to predict patients as non-cancerous. In this case the accuracy will be 90% but this as we discussed before will be very dangerous. Therefore accuracy is not a good metric as it will mislead us in interpreting the results correctly.

Precision (Positive predictive value):

precision in Machine Learning

When aiming for a proficient classifier, the desired precision should ideally reach 1, signifying high accuracy. Achieving precision of 1 occurs when the numerator equals the denominator, specifically when True Positives (TP) equals the sum of TP and False Positives (FP), resulting in FP being zero. It’s crucial to note that as FP increases, the denominator surpasses thenumerator, causing a decline in precision, which is undesirable.

In the context of the cancer example, the precision calculation is exemplified as 45/(45+18) = 0.714.

Now, let’s delve into another vital metric – recall. Also referred to as sensitivity or the true positive rate, recall is defined as follows:

Recall:

For a reliable classifier, it’s crucial for recall to ideally reach 1, indicating high effectiveness. This occurs when the numerator matches the denominator, specifically when True Positives (TP) equals the sum of TP and False Negatives (FN), resulting in FN being zero. It’s worth noting that an increase in FN elevates the denominator above the numerator, causing a decline in recall, which is undesirable.

Now, let’s apply this concept to the cancer example to calculate recall:

Recall = 45/(45+12)

= 0.789

In an ideal classifier, both precision and recall should be 1, implying zero False Positives (FP) and False Negatives (FN). To achieve this comprehensive evaluation, we turn to the F1-score. This metric, considering both precision and recall, is defined as follows:

F1 Score:

F1 Score in Machine Learning

The F1 Score achieves the pinnacle of 1 only when both precision and recall hit perfection. It attains a high value when both precision and recall are elevated. Representing the harmonic mean of precision and recall, the F1 Score stands out as a superior metric compared to accuracy.

In our cancer example,

the F1 Score is calculated as follows: f1 = 2* ((0.71*0.789)/(0.71+0.789))

= 0.7474

For further exploration, I recommend reading into an insightful article on common binary classification metrics by neptune.ai. You can find the article at this link: neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc.