# Introduction to Dimensionality Reduction

## Introduction to Dimensionality Reduction?

Introduction to Dimensionality Reduction: There are frequently too many factors on which the final categorization is made in machine learning classification issues. These elements are essentially variables referred to as features. The more features there are, the more difficult it is to envision the training set and subsequently work on it. Most of these characteristics are sometimes connected and hence redundant. Dimensionality reduction methods are useful in this situation. The technique of lowering the number of random variables under consideration by generating a set of primary variables is known as dimensionality reduction. It is split into two parts: feature selection and feature extraction.

What is Predictive Modeling:

Predictive modelling is a probabilistic method for forecasting outcomes based on a set of predictors. These predictors are essentially characteristics that are considered while determining the ultimate result, i.e. the model’s outcome. What is the significance of dimensionality reduction in machine learning and predictive modelling?

Introduction to Dimensionality Reduction : A simple e-mail classification problem, in which we must determine if the e-mail is spam or not, provides an intuitive illustration of dimensionality reduction. This can include a variety of factors, such as whether or not the email has a generic subject, the email’s content, whether or not the email employs a template, and so on. Some of these characteristics, however, may overlap. In another situation, a classification problem that relies on both humidity and rainfall can be reduced into just one underlying feature, because the two are highly associated. As a result, the number of features in such situations can be reduced. A 3-D classification problem can be difficult to picture, but a 2-D problem can be mapped to a basic two-dimensional space, and a 1-D problem to a simple line. This notion is illustrated in the diagram below, where a 3-D feature space is split into two 1-D feature spaces, and then the number of features can be decreased even lower if they are discovered to be associated.

Components of Dimensionality Reduction:

There are two components of dimensionality reduction:

Feature selection:

Introduction to Dimensionality Reduction : We strive to locate a subset of the original collection of variables, or features, in order to reduce the number of variables that can be utilized to represent the problem. It is usually accomplished in three ways:
1. Filter
2. Wrapper
3. Embedded

Feature extraction:

This reduces data in a high-dimensional space to a lower-dimensional one, which has fewer dimensions.

Methods of Dimensionality Reduction:

Introduction to Dimensionality Reduction : The following are some of the approaches used to reduce dimensionality:

• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Generalized Discriminant Analysis (GDA)

Depending on the method, dimensionality reduction might be linear or non-linear. The principal linear approach, often known as Principal Component Analysis, or PCA, is explored further down.

Principal Component Analysis:

Karl Pearson was the first to propose this strategy. It works on the premise that when data from a higher dimensions space is translated to data from a lower dimensional space, the lower dimensional space’s variance should be the greatest.

It entails the following procedures:

• Construct the data’s covariance matrix.
• Calculate the matrix’s eigenvectors.
• To recover a large fraction of the variance of the original data, eigenvectors

corresponding to the biggest eigenvalues are used.
As a result, we have a smaller number of eigenvectors, and some data may have been lost in the process. However, the remaining eigenvectors should keep the most significant variances.

• It aids data compression, resulting in less storage space.
• It cuts down on computation time.
• It also aids in the removal of any unnecessary features.

• It’s possible that some data will be lost as a result.
• PCA has a tendency to detect linear connections between variables, which isn’t always a good thing.
• When mean and covariance are insufficient to characterize datasets, PCA fails.
• We may not know how many major components to keep track of, but in practice, some guidelines are followed.

### Introduction to Kernel PCA

Principle Component Analysis:

is a technique for reducing the number of dimensions in data. It helps us to lower the data’s dimension without losing too much information. PCA decreases the dimension by identifying the most significant orthogonal linear combinations (principal components) of the original variables. The first principal component captures the majority of the data variance. The second principle component is orthogonal to the first principal component and captures the remainder of the first principal component’s variance, and so on. The number of principle components is equal to the number of initial variables.
These principle components are uncorrelated, and they’re arranged in such a way that the first few explain the majority of the variance in the original data.

KERNEL PCA:

A linear technique is PCA. That is, it can only be used on datasets that can be separated linearly. For datasets that are linearly separable, it performs admirably. However, if we apply it to non-linear datasets, we may end up with a dimensionality reduction that isn’t optimal. Kernel PCA projects a dataset into a higher-dimensional feature space, where it can be linearly separated, using a kernel function. Support Vector Machines are a similar concept.
There are several kernel approaches to choose from, including linear, polynomial, and gaussian.

Code: Create a nonlinear dataset and then use PCA to analyze it.

import matplotlib.pyplot as plt
from sklearn.datasets import make_moons

X, y = make_moons(n_samples = 500, noise = 0.02, random_state = 417)

plt.scatter(X[:, 0], X[:, 1], c = y)
plt.show()

Code: Let’s apply PCA on this dataset

from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
X_pca = pca.fit_transform(X)

plt.title(“PCA”)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c = y)
plt.xlabel(“Component 1”)
plt.ylabel(“Component 2”)
plt.show()
As can be seen, PCA was unable to distinguish between the two classes.

Code: Using an RBF kernel with a gamma value of 15 to perform kernel PCA on this dataset.

from sklearn.decomposition import KernelPCA
kpca = KernelPCA(kernel =’rbf’, gamma = 15)
X_kpca = kpca.fit_transform(X)

plt.title(“Kernel PCA”)
plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c = y)
plt.show()

The two classes are linearly separable in kernel space. Kernel PCA projects the dataset into a higher-dimensional space where it can be linearly separated using a kernel function.
Finally, we used scikit-learn to apply kernel PCA to a non-linear dataset.