Blog Archive

Saturday, February 8, 2014

Rahul Gandhi's interview by Arnab Goswame



The day after the famous interview was aired, the complete transcript of the interview was available online to read through. On facebook, I saw one of my college juniors had made a simple shell script to count the number occurrences of certain words. This by itself was quite insightful and I knew I could take it to the next level without much effort by using R and few of its packages.

I've uploaded the R script on github. Following is the basic flow of the script

  1. Separate the Rahul and Arnabs conversation into 2 buckets
  2. Remove extra spaces
  3. Remove punctuation
  4. Convert the text to lower case
  5. Remove the stop words
  6. Convert the text to a term document matrix
  7. Rank the words based on their occurrences
  8. Generate the word cloud and also their top 5 words

Here is the word cloud.


No comments:

Post a Comment