{"id":170,"date":"2015-05-14T04:09:20","date_gmt":"2015-05-14T04:09:20","guid":{"rendered":"https:\/\/existencia.org\/pro\/?p=170"},"modified":"2015-05-14T04:09:20","modified_gmt":"2015-05-14T04:09:20","slug":"google-scholar-alerts-to-rss-a-punctuated-equilibrium","status":"publish","type":"post","link":"https:\/\/existencia.org\/pro\/?p=170","title":{"rendered":"Google Scholar Alerts to RSS: A punctuated equilibrium"},"content":{"rendered":"<p>If you&#8217;re like me, you have a pile of <a href=\"https:\/\/scholar.google.com\/scholar_alerts?view_op=list_alerts&#038;hl=en\">Google Scholar Alerts<\/a> that you never manage to read.  It&#8217;s a reflection of a more general problem: how do you find good articles, when there are so many articles to sift through? <\/p>\n<p>I&#8217;ve recently started using <a href=\"https:\/\/github.com\/connerbw\/sux0r\">Sux0r<\/a>, a Bayesian filtering RSS feed reader.  However, Google Scholar sends alerts to one&#8217;s email, and we&#8217;ll want to extract each paper as a separate RSS item.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/alertemail.png\" alt=\"alertemail\" width=\"600\" height=\"302\" class=\"aligncenter size-full wp-image-174\" srcset=\"https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/alertemail.png 732w, https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/alertemail-300x151.png 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<p>Here&#8217;s my process, and the steps for doing it yourself:<\/p>\n<p><b>Google Scholar Alerts &rarr; IFTTT &rarr; Blogger &rarr; Perl &rarr; DreamHost &rarr; RSS &rarr; Bayesian Reader<\/b><\/p>\n<ol>\n<li>Create a Blogger blog that you will just use for Google Scholar Alerts: Go to the <a href=\"https:\/\/www.blogger.com\/home\">Blogger Home Page<\/a> and follow the steps under &#8220;New Blog&#8221;.<\/li>\n<li>Sign up for <a href=\"https:\/\/ifttt.com\/\">IFTTT<\/a> (if you don&#8217;t already have an account), and create a new recipe to post emails from <tt>scholaralerts-noreply@google.com<\/tt> to your new blog.  The channel for the trigger is your email system (Gmail for me); the trigger is &#8220;New email in inbox from&#8230;&#8221;; the channel for the action is Blogger; and the title and labels can be whatever you want as along as the body is &#8220;{{BodyPlain}}&#8221; (which includes HTML).\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/ifttttrigger-300x128.png\" alt=\"ifttttrigger\" width=\"300\" height=\"128\" class=\"aligncenter size-medium wp-image-175\" srcset=\"https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/ifttttrigger-300x128.png 300w, https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/ifttttrigger.png 636w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/>\n<\/li>\n<li>Modify the Perl code below, pointing it to the front page of your new Blogger blog.  It will return an RSS feed when called at the command line (<tt>perl scholar.pl<\/tt>).\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/rssfeed-1024x300.png\" alt=\"rssfeed\" width=\"600\" height=\"176\" class=\"aligncenter size-large wp-image-176\" srcset=\"https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/rssfeed-1024x300.png 1024w, https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/rssfeed-300x88.png 300w, https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/rssfeed.png 1182w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/>\n<\/li>\n<li>Upload the Perl script to your favorite server (mine, <a href=\"https:\/\/existencia.org\/\">https:\/\/existencia.org\/<\/a>, is powered by <a href=\"http:\/\/www.dreamhost.com\/\">DreamHost<\/a>.<\/li>\n<li>Point your favorite RSS reader to the URL of the Perl script as an RSS feed, and wait as the Google Alerts come streaming in!<\/li>\n<\/ol>\n<p>Here is the code for the Alert-Blogger-to-RSS Perl script.  All you need to do is fill in the <tt>$url<\/tt> line below.<\/p>\n<pre class=\"brush: perl; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\n#!\/usr\/bin\/perl -w\r\nuse strict;\r\nuse CGI qw(:standard);\r\n\r\nuse XML::RSS; # Library for RSS generation\r\nuse LWP::Simple; # Library for web access\r\n\r\n# Download the first page from the blog\r\nmy $url = &quot;http:\/\/mygooglealerts.blogspot.com\/&quot;; ### &lt;-- FILL IN HERE!\r\nmy $input = get($url);\r\nmy @lines = split \/\\n\/, $input;\r\n\r\n# Set up the RSS feed we will fill\r\nmy $rss = new XML::RSS(version =&gt; '2.0');\r\n$rss-&gt;channel(title =&gt; &quot;Google Scholar Alerts&quot;);\r\n\r\n# Iterate through the lines of HTML\r\nmy $ii = 0;\r\nwhile ($ii &lt; $#lines) {\r\n    my $line = $lines&#x5B;$ii];\r\n    # Look for a &lt;h3&gt; starting the entry\r\n    if ($line !~ \/^&lt;h3 style=&quot;font-weight:normal\/) {\r\n        $ii = ++$ii;\r\n        next;\r\n    }\r\n\r\n    # Extract the title and link\r\n    $line =~ \/&lt;a href=&quot;(&#x5B;^&quot;]+)&quot;&gt;&lt;font .*?&gt;(.+)&lt;\\\/font&gt;\/;\r\n    my $title = $2;\r\n    my $link = $1;\r\n\r\n    # Extract the authors and publication information\r\n    my $line2 = $lines&#x5B;$ii+1];\r\n    $line2 =~ \/&lt;div&gt;&lt;font .+?&gt;(&#x5B;^&lt;]+?) - (.*?, )?(\\d{4})\/;\r\n    my $authors = $1;\r\n    my $journal = (defined $2) ? $2 : '';\r\n    my $year = $3;\r\n\r\n    # Extract the snippets\r\n    my $line3 = $lines&#x5B;$ii+2];\r\n    $line3 =~ \/&lt;div&gt;&lt;font .+?&gt;(.+?)&lt;br \\\/&gt;\/;\r\n    my $content = $1;\r\n    for ($ii = $ii + 3; $ii &lt; @lines; $ii++) {\r\n        my $linen = $lines&#x5B;$ii];\r\n        # Are we done, or is there another line of snippets?\r\n        if ($linen =~ \/^(.+?)&lt;\\\/font&gt;&lt;\\\/div&gt;\/) {\r\n            $content = $content . '&lt;br \/&gt;' . $1;\r\n            last;\r\n        } else {\r\n            $linen =~ \/^(.+?)&lt;br \\\/&gt;\/;\r\n            $content = $content . '&lt;br \/&gt;' . $1;\r\n        }\r\n    }\r\n    $ii = ++$ii;\r\n\r\n    # Use the title and publication for the RSS entry title\r\n    my $longtitle = &quot;$title ($authors, $journal $year)&quot;;\r\n\r\n    # Add it to the RSS feed\r\n    $rss-&gt;add_item(title =&gt; $longtitle,\r\n                   link =&gt; $link,\r\n                   description =&gt; $content);\r\n        \r\n    $ii = ++$ii;\r\n}\r\n\r\n# Write out the RSS feed\r\nprint header('application\/xml+rss');\r\nprint $rss-&gt;as_string;\r\n<\/pre>\n<p>In Sux0r, here are a couple of items form the final result:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/sux0rfeed.png\" alt=\"sux0rfeed\" width=\"600\" height=\"346\" class=\"aligncenter size-full wp-image-177\" srcset=\"https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/sux0rfeed.png 756w, https:\/\/existencia.org\/pro\/wp-content\/uploads\/2015\/05\/sux0rfeed-300x173.png 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;re like me, you have a pile of Google Scholar Alerts that you never manage to read. It&#8217;s a reflection of a more general problem: how do you find good articles, when there are so many articles to sift through? I&#8217;ve recently started using Sux0r, a Bayesian filtering RSS feed reader. However, Google Scholar [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,4],"tags":[],"class_list":["post-170","post","type-post","status-publish","format-standard","hentry","category-research","category-software"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4Zh9E-2K","_links":{"self":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts\/170","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=170"}],"version-history":[{"count":0,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts\/170\/revisions"}],"wp:attachment":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=170"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=170"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=170"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}