The Very Basics – The One with the Probabilities


On my new journey as a grad school student, the one thing I realized was the importance of statistics knowledge before getting your hands on with all those interesting regressors and classifiers. A basic knowledge of statistics will help you understand the concepts and lets you see the problem and the solution inside out.

To start with, there are two different concepts of probability
1) The frequency at which a particular event happens in the long run – This is called statistical probability
2) The degree of belief which it is reasonable to place in proportion to a given evidence – This is called inductive probability

For example,

If I toss a coin, what is the probability that it will turn up heads? Now everyone will say “ A half chance” considering the coin, I suppose, “Fair” and the “chances of H or T is equally likely”. But what if we toss the coin lots of times? What will happen then? Will the coin still be fair? Well, an experiment by John Kerrich (from World war II) attempted to explain the long-run relative frequencies by tossing the coin for 10000 times.

At the end of the experiment, the proportion of heads was 5067 and that tails of 4933. Even though there are fluctuations in the beginning, the graph seems to settle down for larger numbers. It seems the fluctuations disappear slightly and the value is nearing around half i.e. equally likely. This limiting value is the “statistical” probability value of heads that you answered in the previous question. Consider there are three events A, B, C and if an event A occurs ‘n’ times, the proportion of times on which the event has occurred is n(A)/n which is P(A) – probability. This is an empirical or experimental approach towards probability and we can never know the certainty of the probability of the event and according to Kahneman, the results follows the law of large numbers.

Mostly, the events we deal with for the probabilities or any analysis, in general, are numerical. Those numerical variables which take different values with different probabilities are known as random variables. Events such as a number of students in each class which results from counting and can take only values like 1, 2, 25 are Discrete variables and events such as the weight of the students in the class which results from measuring and can take a certain range of values are Continuous variables.


Frequency Distribution of SAT Scores

As you can see from the graph of frequencies of various scores, the number of observations increase as the frequency distribution limits to a probability distribution.

And the graph of weights of the trees is called the histogram that is the bars represent the probability of that particular event. For larger observations and smaller intervals, the histogram will look like a smooth curve which is called the probability density curve. And the plot is called the normal distribution.
There are various properties to this distribution like center, variability, and shape etc.


Mean: Mean is simply the average of all the observations. i.e. sum of ‘n’ observations divided by ‘n’. And the important difference to keep in mind is that the mean of the frequency distributions is denoted a x¯ and the mean of the whole population or probability distribution is denoted as the μ.

Median: The set of observations in the middle when arranged in a sorted order.

Mode: The value of the most frequent observation. The mode of the discrete probability distribution is the value which has the highest probability of occurring and the mode of the continuous probability distribution is the point at which the density function attains the maximum value.



Dispersion: The feature of the random variable is its variability and the mean deviation is the most obvious measure i.e. the average of absolute deviations from the mean. Another measure of dispersion is the interquartile range which is the region between the upper and the lower quartile. The upper and the lower quartiles are the points in the graph at which the cumulative frequency distribution reaches the ¼ and ¾  respectively. The third and most used measure of dispersion is the Standard Deviation, it is the square root of the average of the squared deviations from the mean.

After understanding the location and variability, now we focus on the shape of the distribution.


Skewness: It is a measure of the lack of symmetry. A distribution is symmetrical if it looks the same on the both sides from the center. Considering the degree of the symmetry, the distributions whose right tail is longer than the left hand are called skew to right and the distributions whose left tail is longer than the right hand is called skew to left.


It is a measure of the peak in the distribution. A normal distribution is a mesokurtic distribution and heavier tails and higher peaks than normal is a leptokurtic distribution. A platykurtic distribution has a lower peak than normal distribution and lighter tails.

A normal distribution is a proper bell curve with ‘0’ skew and kurtosis