Statistical models such as Perceptron, Logistic Regression and Support Vector Machine are designed to classify two classes at a time and do not natively support classification tasks with more than two classes.
So, to Implement multiclass classification in the above models, We usually split the multi-class classification dataset into multiple binary classification datasets and fit a binary classification model on each. This leads to two different meta strategies — OVR and OVO
One vs Rest (OVR)
OVR strategy splits a multi-class classification into one binary classification problem per class.
For eg- If we want to classify Red, Blue and Green
Step 1 : We will create a copies of the original dataset and modify them.
Step 2 : In the first copy, We replace all the labels not equal to Red by 0. In the Second copy, we replace all labels not equal to Blue by 0 and In the third copy, We replace all labels not equal to Green by 0.
Step 3 : So, We will try to make three different binary classification problems
Red vs [Blue, Green]
Blue vs [Red, Green]
Green vs [Red, Blue]
Step 4 : In the above scenario, We have to distinguish between labels Red vs [Blue, Green] where Red is the positive class and denoted by 1, [Blue, Green] is a negative class denoted by 0. Same way, Blue and Green is a positive class labelled by 1 and rest of the classes are negative labelled as 0.
Step 5 : Once we distinguish these labels by creating 3 models, We get three predictions. We then pick the prediction of a non -zero class which is the most certain and use argmax of these score(class index with largest score) is then used to predict a class.
One vs One (OVO)
In, OVO strategy, We split a multi-class classification into one binary classification problem per each pair of classes.
If there are 4 categories Red, Blue, Green, Yellow in the target variable
Step 1 : We will try to pick possible combinations per class and build the classifiers accordingly.
Step 2 : Possible combinations per class are
Red vs Blue
Red vs Green
Red vs Yellow
Blue vs Green
Blue vs Yellow
Green vs Yellow
Step 3 : Now, We will try to fit 6 different binary classifier on each pair of classes. So, picking a pair of classes from a set of n classes and develop a binary classifier for each pair.
Step 4 : So, given n classes we can pick all the possible combinations of pairs of classes from n and then for each pair we develop a binary classifier.
Step 5 : Once these classifiers are applied to an unseen sample and the class that got highest no of ‘+1’ predictions (after argmax of the sum of scores) or the class with the largest score is selected as class label.
How OVR is different from OVO ?
Shape of decision boundary
There is a difference in the no of classifiers which model learns strongly correlates with the decision boundary created. In OVR, The shape of the decision boundary is much different because it trains one classifier for each class against all the other classes Whereas In OVO, It is easy to build a decision boundary for each two pair class combination
OVR is much sensitive to the problems of Imbalanced dataset because of the shape of classifier where as, OVO is less sensitive to the problems of Imbalanced dataset.
No of models in OVO and OVR
OVR requires models to be created for each class, For eg 4 classes require 4 models. This could be an issue for the large datasets or a very large no of classes also increases the number of models per class and can increase the training time. In OVO, no of models build entirely depends upon the pair of each classes which takes less training time than OVR. Formula for calculating the no of classifier on each binary classification problem is c(n, k) = n!/k!(n-k)! where n is classes and k is pairs.
Implementation in Python
Loading libraries and the data
Creating X and Y variable
Implementation of OVR Classifier in Logistic Regression
Implementation of OVO Classifier in Logistic Regression
Implementation of OVR classifier in SVC
Implementation of OVO Classifier in SVC
Thanks for following this post until the end! We have covered OVR,OVO in detail and its implementation in python.
Happy Machine Learning :)