{"id":311,"date":"2019-06-12T19:46:57","date_gmt":"2019-06-12T19:46:57","guid":{"rendered":"https:\/\/orange-attractor.eu\/?p=311"},"modified":"2025-07-20T12:05:38","modified_gmt":"2025-07-20T11:05:38","slug":"biased-and-unbiased-estimators","status":"publish","type":"post","link":"https:\/\/orange-attractor.eu\/?p=311","title":{"rendered":"Biased and unbiased estimators"},"content":{"rendered":"<p><!--\n\n\n<div class=\"translation-abstract-button\">Streszczenie po polsku<\/div>\n\n\n\n\n<div class=\"translation-abstract hidden-section\">\nW artykule por\u00f3wnujemy wyniki jakie daj\u0105 dwa estymatory odchylenia standardowego: obci\u0105\u017cony oraz nieobci\u0105\u017cony. Symulacja oparta jest na populacji liczb wylosowanych z rozk\u0142adem normalnym.  Dla ma\u0142ych warto\u015bci rozmiaru pr\u00f3bki (rz\u0119du 5 - 50) estymator nieobci\u0105\u017cony daje wyniki bli\u017csze rzeczywisto\u015bci. Dla wi\u0119kszych pr\u00f3bek, r\u00f3\u017cnica pomi\u0119dzy oboma estymatorami jest pomijalna. Referencje:\n\n\n<ul>\n \t\n\n<li>Wykresy dla ma\u0142ych pr\u00f3bek: <a href=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel1.png\">estimators-small.png<\/a><\/li>\n\n\n \t\n\n<li>Wykresy dla du\u017cych pr\u00f3bek: <a href=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel2.png\">estimators-big.png<\/a><\/li>\n\n\n \t\n\n<li>Kod \u017ar\u00f3d\u0142owy: <a href=\"https:\/\/github.com\/claperius\/orange-attractor\/tree\/master\/estimators\">stdev_simul.py<\/a><\/li>\n\n\n<\/ul>\n\n\n<\/div>\n\n\n--><\/p>\n<p>When we want to know a standard deviation of a big population, we usually take a sample from the whole and than calculate estimator value. However it is not always clear which estimator should we use. Sometimes people argue whenever biased or unbiased standard deviation estimator is better. Below we explore this field and present the result of the numerical simulation.<\/p>\n<p><!--more--><\/p>\n<h2>Description<\/h2>\n<p>While estimating expected value usually does not have any related controversy, for standard deviation we have two competing estimators: biased: \\(\\sigma_n\\) and unbiased: \\(\\sigma_{n-1}\\). They are defined as follows:<\/p>\n<p>$$ \\sigma_n = \\sqrt{ \\frac{1}{n} \\sum_{i=1}^n (x_i &#8211; \\bar{x})^2} $$<\/p>\n<p>and:<\/p>\n<p>$$ \\sigma_{n-1} = \\sqrt{ \\frac{1}{n-1} \\sum_{i=1}^n (x_i &#8211; \\bar{x})^2} $$<\/p>\n<p>Detailed description of estimator bias can be read in <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bias_of_an_estimator\">wikipedia article<\/a>.<\/p>\n<p>For our experiment we generated population of \\(10^6\\) numbers with normal distribution (\\(\\mu=0, \\sigma=1\\)). Than for the subsequent sample sizes starting from \\(n=5\\) we randomly choose \\(n\\) numbers from the population &#8211; twice (they are chosen using uniform distribution). The first set is used for calculating \\(\\sigma_n\\), the second for \\(\\sigma_{n-1}\\). For each sample size the procedure is repeated \\(r=500\\) times and the final result for given sample and given estimator is calculated as the average over all \\(r\\) achieved values.<\/p>\n<h2>Results<\/h2>\n<p>The picture below, contains two charts presenting result of simulation.<\/p>\n<div class=\"image-media-standard\"><a href=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-358\" src=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel1-1024x768.png\" alt=\"\" width=\"640\" height=\"480\" srcset=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel1-1024x768.png 1024w, https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel1-300x225.png 300w, https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel1-768x576.png 768w, https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel1-220x165.png 220w, https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel1.png 1600w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/div>\n<p>On the upper chart, each point represents achieved average result for given estimator &#8211; biased marked as &#8220;+&#8221;, unbiased as &#8220;x&#8221;. Biased and unbiased estimators were calculated on different samples. The blue dashed line represents theoretical value for given distribution. The lower chart shows ratio between biased and unbiased value together with asymptotic theoretical value (dashed line). For all presented sample sizes in given range the unbiased estimator gives better estimate of real standard deviation value.<\/p>\n<p>In contrast to the presented behavior, the result for the higher values of sample sizes appears to be different &#8211; see the charts attached below:<\/p>\n<div class=\"image-media-standard\"><a href=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-366\" src=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel2-1024x768.png\" alt=\"\" width=\"640\" height=\"480\" srcset=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel2-1024x768.png 1024w, https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel2-300x225.png 300w, https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel2-768x576.png 768w, https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel2-220x165.png 220w, https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/06\/est-rel2.png 1600w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/div>\n<p>As we can observe on the charts for the higher values of sample sizes, the achieved value seems to be uncorrelated with the estimator type.<\/p>\n<p>Although the results presented here were based on normal population distribution, we have also tested uniform and Poisson distribution. They give similar results.<\/p>\n<h2>Summary<\/h2>\n<p>For small sample sizes estimator \\(\\sigma_{n-1}\\) gives definitely better results, closer to the population theoretical value. As the \\(n\\) grows, the crucial factors are becoming similar:<\/p>\n<p>$$ \\sqrt{\\frac{1}{n}} \\approx \\sqrt{\\frac{1}{n-1}} \\quad (\\text{for n being sufficiently big}),$$<br \/>\nthus the difference between estimators becomes irrelevant.<\/p>\n<p>The presented results were computed using <a href=\"https:\/\/www.numpy.org\/\">numpy<\/a>. For pseudo-random number generation we used Mersenne-Twister algorithm included in the library. Details of the implementation can be found at <a href=\"https:\/\/github.com\/mikolaj1024\/orange-attractor\/tree\/master\/estimators\">stdev_simul.py<\/a><\/p>\n<p><i>Miko\u0142aj Sitarz, 2019<\/i><br \/>\n<a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-240\" src=\"https:\/\/orange-attractor.eu\/wp-content\/uploads\/2019\/04\/cc-by-sa-nc.png\" alt=\"\" width=\"88\" height=\"31\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>When we want to know a standard deviation of a big population, we usually take a sample from the whole and than calculate estimator value. However it is not always clear which estimator should we use. Sometimes people argue whenever biased or unbiased standard deviation estimator is better. Below we explore this field and present&hellip; <a class=\"read-more\" href=\"https:\/\/orange-attractor.eu\/?p=311\">Read More<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[10],"class_list":["post-311","post","type-post","status-publish","format-standard","hentry","category-statistics","tag-bessels-correction"],"_links":{"self":[{"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=\/wp\/v2\/posts\/311"}],"collection":[{"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=311"}],"version-history":[{"count":71,"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=\/wp\/v2\/posts\/311\/revisions"}],"predecessor-version":[{"id":759,"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=\/wp\/v2\/posts\/311\/revisions\/759"}],"wp:attachment":[{"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=311"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=311"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/orange-attractor.eu\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=311"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}