Gaussian Naive Bayes¶

by Lucy X. Shi, on November 1, 2022.

Introduction¶

Gaussian Naive Bayes is a supervised learning algorithm based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable. It is typically used for classification.

Probabilistic Derivation¶

Naive Bayes¶

Bayes’ theorem states the following relationship, given class variable $y$ and dependent feature vector $x_1$ through $x_n$ :

$P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots, x_n \mid y)} {P(x_1, \dots, x_n)}$

Using the naive conditional independence assumption that

$P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y),$

for all $i$, this relationship is simplified to

$P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)} {P(x_1, \dots, x_n)}$

Since $P(x_1, \dots, x_n)$ is constant given the input, we can use the following classification rule:

$\begin{align}\begin{aligned}P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)\\\Downarrow\\\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y),\end{aligned}\end{align}$

and we can use the relative frequency of class $y$ in the training set to estimate $P(y)$.