Mining Made Easy: Rahul Gandhi's interview by Arnab Goswame

The day after the famous interview was aired, the complete transcript of the interview was available online to read through. On facebook, I saw one of my college juniors had made a simple shell script to count the number occurrences of certain words. This by itself was quite insightful and I knew I could take it to the next level without much effort by using R and few of its packages.

I've uploaded the R script on github. Following is the basic flow of the script

Separate the Rahul and Arnabs conversation into 2 buckets
Remove extra spaces
Remove punctuation
Convert the text to lower case
Remove the stop words
Convert the text to a term document matrix
Rank the words based on their occurrences
Generate the word cloud and also their top 5 words

Here is the word cloud.

Mining Made Easy

Blog Archive

Saturday, February 8, 2014

Rahul Gandhi's interview by Arnab Goswame

No comments:

Post a Comment

About Me