{"id":168,"date":"2015-04-22T12:59:43","date_gmt":"2015-04-22T12:59:43","guid":{"rendered":"https:\/\/existencia.org\/pro\/?p=168"},"modified":"2015-04-22T12:59:43","modified_gmt":"2015-04-22T12:59:43","slug":"scripts-for-twitter-data","status":"publish","type":"post","link":"https:\/\/existencia.org\/pro\/?p=168","title":{"rendered":"Scripts for Twitter Data"},"content":{"rendered":"<p>Twitter data&#8211; the endless stream of tweets, the user network, and the rise and fall of hashtags&#8211; offers a flood of insight into the minute-by-minute state of the society.  Or at least one self-selecting part of it.  A lot of people want to use it for research, and it turns out to be pretty easy to do so.<\/p>\n<p>You can either purchase twitter data, or collect it in real-time.  If you purchase twitter data, it&#8217;s all organized for you and available historically, but it basically isn&#8217;t anything that you can&#8217;t get yourself by monitoring twitter in real-time.  I&#8217;ve used <a href=\"https:\/\/gnip.com\/\">GNIP<\/a>, where the going rate was about $500 per million tweets in 2013.<\/p>\n<p>There are two main ways to collect data directly from twitter: &#8220;queries&#8221; and the &#8220;stream&#8221;. Queries let you get up to 1000 tweets at any point in time&#8211; whichever the most recent tweets that match your search criteria. The stream gives you a fraction of a percent of tweets continuously, which very quickly adds up, based on filtering criteria.<\/p>\n<p>Scripts for doing these two options are below, but you need to decide on the search\/streaming criteria. Typically, these are search terms and geographical constraints. See <a href=\"https:\/\/dev.twitter.com\/docs\/using-search\">Twitter&#8217;s API documentation<\/a> to decide on your search options.<\/p>\n<p>Twitter uses an athentication system to identify both the individual collecting the data, and what tool is helping them do it.  It is easy to register a new tool, whereby you pretend that you&#8217;re a startup with a great new app.  Here are the steps:<\/p>\n<ol>\n<li>Install python&#8217;s twitter package, using &#8220;easy_install twitter&#8221; or &#8220;pip install twitter&#8221;.<\/li>\n<li>Create an app at <a href=\"https:\/\/apps.twitter.com\/\">https:\/\/apps.twitter.com\/<\/a>. Leave the callback URL blank, but fill in the rest.<\/li>\n<li>Set the CONSUMER_KEY and CONSUMER_SECRET in the code below to the values you get on the keys and access tokens tab of your app.<\/li>\n<li>Fill in the name of the application.<\/li>\n<li>Fill in any search terms or structured searches you like.<\/li>\n<li>If you&#8217;re using the downloaded scripts, which output data to a CSV file, change where the file is written, to some directory (where it says &#8220;twitter\/us_&#8221;).<\/li>\n<li>Run the script from your computer&#8217;s terminal (i.e., <tt>python search.py<\/tt>)<\/li>\n<li>The script will pop up a browser for you to log into twitter and accept permissions from your app.<\/li>\n<li>Get data.<\/li>\n<\/ol>\n<p>Here is what a simple script looks like:<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport os, twitter\r\n\r\nAPP_NAME = &quot;Your app name&quot;\r\nCONSUMER_KEY = 'Your consumer key'\r\nCONSUMER_SECRET = 'Your consumer token'\r\n\r\n# Do we already have a token saved?\r\nMY_TWITTER_CREDS = os.path.expanduser('~\/.class_credentials')\r\nif not os.path.exists(MY_TWITTER_CREDS):\r\n    # This will ask you to accept the permissions and save the token\r\n    twitter.oauth_dance(APP_NAME, CONSUMER_KEY, CONSUMER_SECRET,\r\n                        MY_TWITTER_CREDS)\r\n\r\n# Read the token\r\noauth_token, oauth_secret = twitter.read_token_file(MY_TWITTER_CREDS)\r\n\r\n# Open up an API object, with the OAuth token\r\napi = twitter.Twitter(api_version=&quot;1.1&quot;, auth=twitter.OAuth(oauth_token, oauth_secret, CONSUMER_KEY, CONSUMER_SECRET))\r\n\r\n# Perform our query\r\ntweets = api.search.tweets(q=&quot;risky business&quot;)\r\n\r\n# Print the results\r\nfor tweet in tweets&#x5B;'statuses']:\r\n    if not 'text' in tweet:\r\n        continue\r\n\r\n    print tweet\r\n    break\r\n<\/pre>\n<p>For automating twitter collection, I&#8217;ve put together scripts for queries (<tt>search.py<\/tt>), streaming (<tt>filter.py<\/tt>), and bash scripts that run them repeatedly (<tt>repsearch.sh<\/tt> and <tt>repfilter.sh<\/tt>).  <a href=\"https:\/\/existencia.org\/files\/tools\/twitterscripts.zip\">Download the scripts<\/a>.<\/p>\n<p>To use the repetition scripts, make the repetition scripts executable by running &#8220;<tt>chmod a+x repsearch.sh repfilter.sh<\/tt>&#8220;. Then run them, by typing <tt>.\/repfilter.sh<\/tt> or <tt>.\/repsearch.sh<\/tt>.  Note that these will create many many files over time, which you&#8217;ll have to merge together.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Twitter data&#8211; the endless stream of tweets, the user network, and the rise and fall of hashtags&#8211; offers a flood of insight into the minute-by-minute state of the society. Or at least one self-selecting part of it. A lot of people want to use it for research, and it turns out to be pretty easy [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,4],"tags":[],"class_list":["post-168","post","type-post","status-publish","format-standard","hentry","category-research","category-software"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p4Zh9E-2I","_links":{"self":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts\/168","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=168"}],"version-history":[{"count":0,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=\/wp\/v2\/posts\/168\/revisions"}],"wp:attachment":[{"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=168"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=168"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/existencia.org\/pro\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=168"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}