Normal! Normal! – The One with all the Distributions

From my previous post, you would have understood the basics of probability of an event. When the frequency of an event is divided by the total number of events, you get the probability of an event. Let us start with different kinds of probabilities like binomial. What are some examples of binomial probabilities that come across your mind? Think of any event that has a dichotomous outcome. Yes. Tossing a coin for ‘n’ number of times. Asking 100 people if they vote or not. As I mentioned, any events with binary outcome like Head or Tail / Yes or No are binomial in nature.

If ‘p’ is the probability of success of an event and ‘q’ is the probability of failure in a binomial experiment of ‘n’ events, then the expected number of successes i.e. the mean value of the binomial distribution is np.

Points to consider:

  • Binomial outcomes are mutually exclusive
  • Variables are represented as “counts” of success or failure
  • The type of variable is discrete
  • The graph resembles a histogram

Now that we know how to identify the distribution for a discrete random variable, we can move towards finding the probabilities and distribution for the continuous random variable, the normal. We say a distribution is normal, if the values fall into a smooth continuous curve with a bell shape symmetric pattern and there should not be any skewness or kurtosis.

Source: MIT EDU
Normal Distribution

I am sure you would have seen something like this, a curve diagram before, with all the z scores and t scores. Before trying to understand the intricacies of a normal data, let us first understand what does the word “normal” mean in this context. Yes,we all know that the curve should be symmetrical, it should have a bell shape etc., We all have learned that from various sources of information. But what does is mean actually.

We understood about the frequency of certain events occurring both in discrete and continuous sense. We have various sources of data from natural events like measuring height and weight to man-made events like analyzing financial data etc., Normality is when the average of the data i.e. the mean tends to be more frequently occurring in the data and other values tends to be closer to the mean and also the measures that are away from mean occurs less frequently. In short, most frequencies of the data are centered around the mean. With mean at the center, a smaller standard deviation results in a taller and narrow tailed curve and a larger standard deviation results in a flat and wider tailed curve.  Hence the standard deviation defines the overall shape of the curve.

One of the popular normal distribution is the Z distribution, which has a mean of zero and a standard deviation of 1 and area under the curve adds up to 1. A value on the Z -distribution signifies the number of standard deviations the data is above or below the mean; these are called z scores. For example, z=1 on the Z-distribution represents a value that is 1 standard deviation above the mean. Similarly, z= –1 represents a value that is one standard deviation below the mean.

So far, we have discussed only probability of a single event. But more often, there would be a need to find the probability of the odds of two or more events happening. This is called cumulative probability. Make sure to keep in mind that each event needs to be independent and the outcomes should not influence the other. In order to find the probability of a set of events, you first need to identify the z score and look up at the Z table for the matching probability [Refer: Z Normal table ]. A z score of -1.0 gives a cumulative probability of 0.1584 and a z score of 0 gives a cumulative probability of 0.50. Hence the probability between each section of z scores is the difference of higher and lower probabilities. For our example of Z between -1 and 0 is 0.3413.

normal_distribution_and_scales
Normal Distribution with Z scores

You will come across this many times: To define a range of events, it is often represented as P (-1 <= Z <= 1) or between -1σ and 1σ.

P (-1 <= Z <= 1) = 68% probability, which is the sum of -1 to 0 and 0 to 1(calculated above as 0.3413)

 Points to consider:

  • Normal outcomes are mutually exclusive
  • Variables are measurements of an event
  • The type of variable is continuous
  • The graph resembles a bell curve
  • Converts to z scores and used normal z tables for areas