Multi-class Classification — One-vs-All & One-vs-One

5 min readFeb 15, 2021

Statistical models such as Perceptron, Logistic Regression and Support Vector Machine are designed to classify two classes at a time and do not natively support classification tasks with more than two classes.

So, to Implement multiclass classification in the above models, We usually split the multi-class classification dataset into multiple binary classification datasets and fit a binary classification model on each. This leads to two different meta strategies — OVR and OVO

One vs Rest (OVR)

OVR strategy splits a multi-class classification into one binary classification problem per class.

For eg- If we want to classify Red, Blue and Green

Step 1 : We will create a copies of the original dataset and modify them.

Step 2 : In the first copy, We replace all the labels not equal to Red by 0. In the Second copy, we replace all labels not equal to Blue by 0 and In the third copy, We replace all labels not equal to Green by 0.

Step 3 : So, We will try to make three different binary classification problems

Red vs [Blue, Green]

Blue vs [Red, Green]

Green vs [Red, Blue]

Step 4 : In the above scenario, We have to distinguish between labels Red vs [Blue, Green] where Red is the positive class and denoted by 1, [Blue, Green] is a negative class denoted by 0. Same way, Blue and Green is a positive class labelled by 1 and rest of the classes are negative labelled as 0.

Step 5 : Once we distinguish these labels by creating 3 models, We get three predictions. We then pick the prediction of a non -zero class which is the most certain and use argmax of these score(class index with largest score) is then used to predict a class.

One vs One (OVO)

In, OVO strategy, We split a multi-class classification into one binary classification problem per each pair of classes.

If there are 4 categories Red, Blue, Green, Yellow in the target variable

Step 1 : We will try to pick possible combinations per class and build the classifiers accordingly.

Step 2 : Possible combinations per class are

Red vs Blue

Red vs Green

Red vs Yellow

Blue vs Green

Blue vs Yellow

Green vs Yellow

Step 3 : Now, We will try to fit 6 different binary classifier on each pair of classes. So, picking a pair of classes from a set of n classes and develop a binary classifier for each pair.

Step 4 : So, given n classes we can pick all the possible combinations of pairs of classes from n and then for each pair we develop a binary classifier.

Step 5 : Once these classifiers are applied to an unseen sample and the class that got highest no of ‘+1’ predictions (after argmax of the sum of scores) or the class with the largest score is selected as class label.

How OVR is different from OVO ?

Shape of decision boundary

There is a difference in the no of classifiers which model learns strongly correlates with the decision boundary created. In OVR, The shape of the decision boundary is much different because it trains one classifier for each class against all the other classes Whereas In OVO, It is easy to build a decision boundary for each two pair class combination

Imbalanced Dataset

OVR is much sensitive to the problems of Imbalanced dataset because of the shape of classifier where as, OVO is less sensitive to the problems of Imbalanced dataset.

No of models in OVO and OVR

OVR requires models to be created for each class, For eg 4 classes require 4 models. This could be an issue for the large datasets or a very large no of classes also increases the number of models per class and can increase the training time. In OVO, no of models build entirely depends upon the pair of each classes which takes less training time than OVR. Formula for calculating the no of classifier on each binary classification problem is c(n, k) = n!/k!(n-k)! where n is classes and k is pairs.

Implementation in Python

Loading libraries and the data