This data, or some very like it, has been posted here before.
It is a wonderful example of how institutions of learning have failed in teaching statistics and experimental methods, causing "researchers" to publish misleading and incorrect information. Oh, hell, the teachers and/or professors have probably deluded themselves that what they have done is right, but it isn't.
The giveaway to how this data was (incorrectly) obtained is seen by clicking the link for the
original graph at the bottom of the page. Notice that the curves are perfectly smooth and
absolutely perfectly symmetrical. That is, the lower left part of each curve below the 50th percentile is an exact mirror image of the upper right part above the 50th percentile. If you were to replot this data as the probability vs. length, each line on the plot would form a perfect "bell curve," or Gaussian Distribution, aka Normal Distribution. And that is very suspicious, as I shall get to in a moment.
The full explanation follows, but by inspection of the curve, and a knowledge of what real-world data of any human measurement looks like, I can tell you each curve was artificially reconstructed from just two numbers: a Mean and a Standard Deviation. Regardless of what some ignorant teacher may have told you in school, such a reconstructed curve does not, and cannot, accurately represent the actual data at the extremes, except in very limited circumstances.
The top and bottom 5% or so of the curve is therefore expected to be completely unreliable, because it does not reflect actual measured data.
For those who have the patience to read, here is a simplified explanation, without using much math...
They may have taught you in school that random distributions accurately follow such a normal distribution. But in fact this is not in general true, and is almost certainly not true for this biological case. (If you believe this to be true, you need to spend more time measuring real things.) The normal distribution is simply a mathematical construction which is easy to manipulate mathematically. Under certain conditions, a measurement which is expected to be affected by the combination of many independent, unrelated, random processes will indeed approach a normal distribution. This is a good assumption to make when trying to estimate the combined effect of, say, random measurement errors. But, in systems were there are non-random causes, or there are factors which make certain variations physically impossible, there is no reason to assume that the distribution will be close to normal. Human mating is a selective process, and as such the results should not be completely random. Further, there are some biological variations (such as mutations) which sometime may accentuate a particular characteristic, but at other times may prove fatal or otherwise unable to propagate. Again, not complete randomness.
Nevertheless, in life sciences, sociology, and anthropometrics (the study of the human body and its dimensions) it has become very customary to simply always assume a normal distribution. One reason for this is that the true distribution may never be known, because determining it may require measuring too many people, or may require measuring people that are not accessible to the study. For example, a great deal of available anthropometric data (other than penis size) comes from measurements made by armies of their enlisted men and women. But clearly people at the extremes of size (very short or extremely tall, very fat, etc.) never actually make it into the army, having been rejected at the recruitment station, so they never get measured and included in the data. It turns out that it is still possible to obtain useful data from the population that remains, but to do so you have to assume a distribution. And it is the normal distribution that is almost always assumed, even though it seldom is completely accurate in describing the data.
If you assume a normal distribution, you only need two numbers to describe the population: the Mean and the Standard Deviation (SD). The Mean is just the average of all the data points, and represents the peak of the bell curve. The SD is calculated from the squares of the difference of each individual data value from the mean, and determines how wide the bell spreads. However, the bell curve obtained in this way is always perfectly symmetrical around the mean, and is really always the same basic shape, except being stretched or compressed in width according to the SD.
It is very important to understand that in general
none of the actual real data points lie exactly on the curve. The curve is only a sort of best fit through the data, as defined by the Mean and SD, but does not exactly describe the data. Also note that the curve extends, or extrapolates, beyond the range of available data. When the data does not have true normal distribution, the normal best fit obtained by calculating the Mean and SD tends to be better around the more common values (i.e. values near the Mean), and becomes increasingly poor as you move away from the mean. As you get still further from the mean, into the extreme cases, the actual data often has very little to do with the curve, because the actual factors which determine these extreme cases are not the multiple random factors that lead to a Normal distribution. But since there are very few data points out there, they have little influence on the shape of the curve. Although there are statistical tests to determine how well the Normal distribution actually fits the real data, these tend to be very insensitive to extreme values, and mostly focus on the central values, nearer the Mean.
Despite the fact that the Normal distribution may be a poor fit to actual human data, and that the researchers may only have actually measured some small central part of the population, the data is still quite useful for
most purposes. For example, if you are a clothing manufacturer, you do not care about how tall or large the top 2% of the population is, because your reward for the great expense of manufacturing these special additional sizes would be at most a 2% revenue increase, and you would rather walk away from money-losing business. You do, however, care that your standard range of sizes fits perhaps 95% of the population, because that is where you make money. And in a Normal distribution, 95% of the population fits into just plus/minus 2 SD around the mean. So long as the Normal distribution does a reasonable job in predicting approximately how many people are each clothing size over this range, it commercially useful in allowing the manufacturer to plan how many of each size to produce. And in telling armies how may uniforms and boots of each size to buy. Nobody really cares about whether the curve fits the real data out at 3 SD, because there are so few people there (< 1%) that it is better to just ignore them, because you aren't going to make shoes in their size anyway.
So, the Normal distribution has stuck in anthropometric studies, because it might provide a reasonable fit to actual data over the range people care about (the middle 95%, or +/- 2 SD), and no one who uses it really cares about the others. And, by assuming this distribution, instead of having to consult big databases of actual data (a real pain in the ass in the pre-computer era) you only need to know 2 numbers, Mean and SD, and will have a useful, although imperfect, understanding of the part of the population you actually care about. Hence, it has become common to simply discard the actual data after the Mean and SD are calculated, and to reconstruct the percentiles from the Mean and SD, not the actual data. In some circles, this is considered a means of improving the quality of the study results, by "filtering out" those pesky real-word variations.
But, data distilled down to those two numbers makes increasingly bad predictions as you move to the more extreme data. If you assume that penis size may be due to non-random factors, such a genetic determination and selective mating, then it is only to be expected that the true values will deviate strongly from those predicted by a normal distribution given by a Mean and SD as you move toward the extremes. Really, the line graph should not extend below the 5th and above the 95th percentile (or thereabouts), so as not to convey artificial and misleading data.
By way of another example, consider human height.
Click here for an example of trying to apply the Normal distribution to the height of women based on the Mean and SD (documented elsewhere as 63.6 and 2.5 inches, respectively), and coming up with impossibly small probabilities for tall women who do in fact exist. Also click the arrow for the authors previous post, where he argues that the Normal distribution is good at approximating human heights; it just breaks apart at the extremes, and way underestimates probability.