{"id":134,"date":"2014-09-02T19:07:34","date_gmt":"2014-09-02T19:07:34","guid":{"rendered":"https:\/\/existencia.org\/pro\/?p=134"},"modified":"2014-09-02T19:11:58","modified_gmt":"2014-09-02T19:11:58","slug":"simple-robust-standard-errors-library-for-r","status":"publish","type":"post","link":"https:\/\/existencia.org\/pro\/?p=134","title":{"rendered":"Simple Robust Standard Errors Library for R"},"content":{"rendered":"<p>R doesn&#8217;t have a built-in method for calculating heteroskedastically-robust and clustered standard errors. Since this is a common need for us, I&#8217;ve put together a little library to provide these functions.<\/p>\n<p>Download the <a href=\"https:\/\/raw.githubusercontent.com\/jrising\/open-estimate\/master\/simple\/tools_reg.R\" download=\"tools_reg.R\">library<\/a>.<\/p>\n<p>The file contains three functions:<\/p>\n<ul>\n<li><code>get.coef.robust<\/code>: Calculate heteroskedastically-robust standard errors<\/li>\n<li><code>get.coef.clust<\/code>: Calculate clustered standard errors<\/li>\n<li><code>get.conf.clust<\/code>: Calculate confidence intervals for clustered standard errors<\/li>\n<\/ul>\n<p>The arguments for the functions are below, and then an example is at the end.<\/p>\n<p><b>get.coef.robust(mod, var=c(), estimator=&#8221;HC&#8221;)<\/b><br \/>\n<i>Calculate heteroskedastically-robust standard errors<\/i><\/p>\n<pre>mod: the result of a call to lm()\r\n     e.g., lm(satell ~ width + factor(color))\r\nvar: a list of indexes to extract only certain coefficients (default: all)\r\n     e.g., 1:2 [to drop FE from above]\r\nestimator: an estimator type passed to vcovHC (default: White's estimator)\r\nReturns an object of type coeftest, with estimate, std. err, t and p values\r\n<\/pre>\n<p><b>get.coef.clust(mod, cluster, var=c())<\/b><br \/>\n<i>Calculate clustered standard errors<\/i><\/p>\n<pre>mod: the result of a call to lm()\r\n     e.g., lm(satell ~ width + factor(color))\r\ncluster: a cluster variable, with a length equal to the observation count\r\n     e.g., color\r\nvar: a list of indexes to extract only certain coefficients (default: all)\r\n     e.g., 1:2 [to drop FE from above]\r\nReturns an object of type coeftest, with estimate, std. err, t and p values\r\n<\/pre>\n<p><b>get.conf.clust(mod, cluster, xx, alpha, var=c())<\/b><br \/>\n<i>Calculate confidence intervals for clustered standard errors<\/i><\/p>\n<pre>mod: the result of a call to lm()\r\n     e.g., lm(satell ~ width + factor(color))\r\ncluster: a cluster variable, with a length equal to the observation count\r\n     e.g., color\r\nxx: new values for each of the variables (only needs to be as long as 'var')\r\n     e.g., seq(0, 50, length.out=100)\r\nalpha: the level of confidence, as an error rate\r\n     e.g., .05 [for 95% confidence intervals]\r\nvar: a list of indexes to extract only certain coefficients (default: all)\r\n     e.g., 1:2 [to drop FE from above]\r\nReturns a data.frame of yhat, lo, and hi\r\n<\/pre>\n<h3>Example in Stata and R<\/h3>\n<p>The following example compare the results from Stata and R for an example from section 3.3.2 of &#8220;An Introduction to Categorical Data Analysis&#8221; by Alan Agresti.  The data describes 173 female horseshoe crabs, and the number of male &#8220;satellites&#8221; that live near them.<\/p>\n<p>Download the <a href=\"https:\/\/raw.githubusercontent.com\/jrising\/open-estimate\/master\/simple\/data.csv\" download=\"data.csv\">data as a CSV file<\/a>.  The columns are the crabs color (2-5), spine condition (1-3), carapace width (cm), number of satellite crabs, and weight (kg).<\/p>\n<p>The code below calculates a simple model with crab color fixed-effects, relating width (which is thought to be an explanatory variable) and number of satellites.  The first model is a straight OLS regression; then we add robust errors, and finally crab color clustered errors.<\/p>\n<p>The analysis in <b>Stata<\/b>.<\/p>\n<p>First, let&#8217;s do the analysis in Stata, to make sure that we can reproduce it in R.<\/p>\n<pre class=\"brush: bash; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport delim data.csv\r\n\r\nxi: reg satell width i.color\r\nxi: reg satell width i.color, vce(hc2)\r\nxi: reg satell width i.color, vce(cluster color)\r\n<\/pre>\n<p>Here are excerpts of the results.<\/p>\n<p>The OLS model:<\/p>\n<pre>\r\n------------------------------------------------------------------------------\r\n      satell |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\r\n-------------+----------------------------------------------------------------\r\n       width |   .4633951   .1117381     4.15   0.000     .2428033    .6839868\r\n       _cons |  -8.412887   3.133074    -2.69   0.008    -14.59816   -2.227618\r\n------------------------------------------------------------------------------\r\n<\/pre>\n<p>The robust errors:<\/p>\n<pre>\r\n------------------------------------------------------------------------------\r\n             |             Robust HC2\r\n      satell |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\r\n-------------+----------------------------------------------------------------\r\n       width |   .4633951   .1028225     4.51   0.000     .2604045    .6663856\r\n       _cons |  -8.412887   2.939015    -2.86   0.005    -14.21505   -2.610726\r\n------------------------------------------------------------------------------\r\n<\/pre>\n<p>Note that we&#8217;ve used HC2 robust errors, to provide a direct comparison with the R results.<\/p>\n<p>The clustered errors.<\/p>\n<pre>\r\n------------------------------------------------------------------------------\r\n             |               Robust\r\n      satell |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\r\n-------------+----------------------------------------------------------------\r\n       width |   .4633951   .0466564     9.93   0.002     .3149135    .6118767\r\n       _cons |  -8.412887   1.258169    -6.69   0.007    -12.41694   -4.408833\r\n------------------------------------------------------------------------------\r\n<\/pre>\n<p>The robust and clustered errors are a little tighter for this example.<\/p>\n<p>The analysis in <b>R<\/b>.<\/p>\n<p>Here&#8217;s the equivalent code in R.<\/p>\n<pre class=\"brush: bash; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nsource(&quot;tools_reg.R&quot;)\r\ntbl &lt;- read.csv(&quot;data.csv&quot;)\r\n\r\nmod &lt;- lm(satell ~ width + factor(color), data=tbl)\r\n\r\nsummary(mod)\r\nget.coef.robust(mod, estimator=&quot;HC2&quot;)\r\nget.coef.clust(mod, tbl$color)\r\n<\/pre>\n<p>And excerpts of the results.  The OLS model:<\/p>\n<pre>\r\n               Estimate Std. Error t value Pr(>|t|)    \r\n(Intercept)     -8.4129     3.1331  -2.685  0.00798 ** \r\nwidth            0.4634     0.1117   4.147 5.33e-05 ***\r\n<\/pre>\n<p>The coefficient estimates will be the same for all three models.  Stata gives us <code>-8.412887 + .4633951 w<\/code> and R gives us <code>-8.4129 + 0.4634 w<\/code>.  These are identical, but with different reporting precision, as are the standard errors.<\/p>\n<p>The robust errors:<\/p>\n<pre>\r\n(Intercept)    -8.41289    2.93902 -2.8625  0.004739 ** \r\nwidth           0.46340    0.10282  4.5067 1.229e-05 ***\r\n<\/pre>\n<p>Again, we use HC2 robust errors.  Stata provides 2.939015 and .1028225 for the errors, matching these exactly.<\/p>\n<p>The clustered errors:<\/p>\n<pre>\r\n                Estimate Std. Error  t value  Pr(>|t|)    \r\n(Intercept)    -8.412887   1.258169  -6.6866 3.235e-10 ***\r\nwidth           0.463395   0.046656   9.9321 < 2.2e-16 ***\r\n<\/pre>\n<p>Stata gives the exact same values as R.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>R doesn&#8217;t have a built-in method for calculating heteroskedastically-robust and clustered standard errors. Since this is a common need for us, I&#8217;ve put together a little library to provide these functions. Download the library. The file contains three functions: get.coef.robust: Calculate heteroskedastically-robust standard errors get.coef.clust: Calculate clustered standard errors get.conf.clust: Calculate confidence intervals for clustered [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3],"tags":[],"class_list":["post-134","post","type-post","status-publish","format-standard","hentry","category-research"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4Zh9E-2a","_links":{"self":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts\/134","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=134"}],"version-history":[{"count":0,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts\/134\/revisions"}],"wp:attachment":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=134"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=134"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}