Normal Distribution

There is a common misconception that data typically is normally distributed. To show why this is not true consider the following. Suppose the heights of males is normally distributed and suppose the heights of females is normally distributed. Then the distribution for the heights of a person in general would not be normally distributed. This means that any time a dataset has subtypes with different distributions like larger male heights and smaller female heights, if the subtypes were normally distributed, then the data in general could not be normally distributed. This contradiction shows that data does not typically follow a normal distribution. If we take N samples from a data distribution and average the samples, this average, if we repeat the sampling would get closer to normal as N becomes large. It will never be normal. That is why it is called the central limit theorem. It says that given any positive error as small as we like, in the limit the difference in the averages distribution and a normal distribution can be made to be less than this error if we increase N enough.

For a strange example, see Benford's Law