Introduction to Supervised Learning:
Supervised Learning: Classification is the task of “classifying things” into sub-categories, as the term implies. But it was done by a machine! If it doesn’t sound impressive, consider your computer’s ability to distinguish between you and a stranger. It’s a cross between a potato and a tomato. Between an A and an F- on a scale of one to ten. Classification is the challenge of determining which of a collection of categories (subpopulations) a new observation belongs to base on a training set of data containing observations and whose category membership is known in Machine Learning and Statistics.
Supervised Learning:
When the model is trained using a labelled dataset, it is referred to as supervised learning. A dataset that has both input and output parameters is referred to as a labelled dataset. Both the training and validation datasets are labelled in this sort of learning.
Example of Supervised Learning Algorithms:
- Linear Regression
- Nearest Neighbor
- Gaussian Naive Bayes
- Decision Trees
- Support Vector Machine (SVM)
- Random Forest
What is classification:
It’s a data analysis task, which entails developing a model that characterizes and differentiates data classes and concepts. On the basis of a training set of data containing observations and whose categories membership is known, classification is the problem of determining which of a set of categories (subpopulations) a new observation belongs to.
Example:
We must first determine the feasibility of any project before proceeding. In this situation, a classifier is needed to predict class labels like ‘Safe’ and ‘Risky’ in order to adopt and approve the Project.
Types of Classification
Classification is of two types:
- Binary Classification:
When we have to group data into two distinct classes, we use binary classification. For example, we must establish whether or not a person has a specific disease based on their current health status.
- Multiclass Classification:
There are more than two classes in a multiclass classification. For example, we must determine which type of flower our observation belongs to base on facts on different species of flowers.
Figure 1: Classification in Binary and Multiclass Modes. The variables x1 and x2 are the ones used to predict the class.
How does classification work?
Assume we need to determine whether a patient has a specific ailment based on three characteristics known as features.
There are two conceivable consequences as a result of this:
- The patient is suffering from the ailment in question. In other words, a “Yes” or “True” result.
- The patient is free of sickness. “No” or “False” as a response.
This is a binary classification problem.
We have a training data set of observations, which consists of sample data with actual classification results. On this data set, we train a model called Classifier, which we then use to predict whether a certain patient would develop the disease or not.
As a result, the outcome now hinges on:
- How well these qualities “map” to the final result.
- The extent to which our data set is of high quality. I’m referring to statistical and mathematical qualities when I say quality.
- How well our Classifier generalizes this feature-outcome relationship.
- The x1 and x2 values are
Following is the generalized block diagram of the classification task.
Generalized Classification Block Diagram
- X: data that has been pre-classified and is represented as a N*M matrix. The number of observations is N, while the number of features is M.
- y: An N-d vector of anticipated classes for each of the N observations.
- Feature Extraction: Using a sequence of transforms, extracting meaningful information from input X.
- We’ll train a machine learning model called a “classifier.”
- y’: The Classifier’s anticipated labels.
- Quality Metric: A metric for assessing the model’s performance.
- The algorithm that is used to update weights w’, which updates the model and “learns” iteratively, is known as the ML Algorithm.
Types of Classifiers (algorithms)
Classifiers come in a variety of shapes and sizes. Here are a few examples:
- Logistic Regression and Linear Classifiers
- Classifiers based on trees: Classifier based on Decision Trees
- Vector Support Machines (SVMs)
- Artificial Neural Networks are a type of artificial neural network.
- Regression Bayesian
- Gaussian Naive Bayes Classifiers are a type of classifier that is used to classify data.
- SGD Classifier (Stochastic Gradient Descent)
- Random Forests, AdaBoost, Bagging Classifier, Voting Classifier, and ExtraTrees Classifier are examples of ensemble methods.
Practical Applications of Classification
- Google’s self-driving automobile detects and classifies obstacles using deep learning-enabled categorization techniques.
- One of the most common and well-known applications of classification algorithms is spam email screening.
- Classification is at the heart of detecting health problems, facial recognition, speech recognition, object detection, and sentiment analysis.
Implementation
Let’s have a look at how classification works in practice. We’ll look at a few different Classifiers and do a quick analytical comparison of their performance on the Iris data set, which is a well-known, standard data set.
Requirements for running the given script
- Python 2.7
- Scipy and Numpy
- Matplotlibfor data visualization
- Pandasfor data i/o
- Scikit-learnProvides all the classifiers
Conclusion
The study of classification is a broad subject. It is one of the most significant aspects of Machine Learning, despite the fact that it is only a small part of the overall.
That’s all I’ve got for now. We’ll explore how classification works in practice in the following post, and we’ll get our hands dirty with Python code.