First, we need to understand the Bayesian probability. Bayes rule is formulated as follows [1]:

To explain this equation, consider the following:

P(A) = Probability that people like red apples.

P(B) = Probability that people like golden apples.

Now, P(B|A) is the probability that those people who liked red apples also like golden apples.

Finally, from the information given we would like to calculate how many people who like golden apples also like red apples, i.e., P(A|B).

P(A) = 40%

P(B) = 60%

P(B|A) = 50%

P(A|B) =?

Let’s see how we can use Bayes theorem for classification problems. Consider that B is a given data. And A is your conjecture, which you’d like to calculate the possibility for it to be true, i.e., P(A|B). Now, if you calculate P(A|B) for *i* different conjectures, then the classification problem becomes finding the maximum P(A_{i}|B). This maximum value is also called Maximum a Posteriori (MAP) [2].

Since, B was a given data, P(B) is constant, and we have:

For a large *i*, calculating the joint probability becomes impractical. To overcome this problem, we use a naïve assumption, that conjectures are not correlated to each other. With this assumption, we can calculate the as:

where *n* is the total number of conjectures.

References:

- Kay, S. (2006).
*Intuitive probability and random processes using MATLAB®*. Springer Science & Business Media. - https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation