Last week I have learned making a word cloud using R.

Here I will be discussing step by step procedure to create a word comparison cloud.

NOTE: You need to get your twitter API access secret keys and token secret keys before starting.

Following R packages are used for the project.So install and explore them.

1)twitteR

2)ROAuth

3)RCurl

4)stringr

5)RJSONIO

6)wordcloud

7)tm

Let me first tell you what are the smaller steps involved in creating it.You need to first extract the data(tweets) from twitter,get text from the extracted tweets,clean the tweets which means removing extra spaces.punctuations,unnecessary numbers and then joining the texts into a single vector,remove stop words,creating corpus,creating term document matrix and then we make a word comparison cloud.

Here I took tweets from TATA DOCOMO and IDEA CELLULAR and built a word comparison cloud.

R Code:

#Collecting tweets from mobile companies
library(twitteR)
library(“ROAuth”)
library(RCurl)
library(stringr)
library(RJSONIO)
library(wordcloud)
library(tm)

# Declare Twitter API Credentials

api_key <- ########

api_secret <- ######

token <- #########

token_secret <- ####

# Create Twitter Connection
setup_twitter_oauth(api_key, api_secret, token, token_secret)

# Idea Cellular tweets
idea_tweets = userTimeline(“ideacellular”, n=500)

# Tata Docomo tweets
tata_tweets = userTimeline(“TataDocomo”, n=500)
# get text

tata_txt = sapply(tata_tweets, function(x) x$getText())
idea_txt = sapply(idea_tweets, function(x) x$getText())
##clean text

clean.text = function(x)
{
# tolower
x = tolower(x)
# remove rt
x = gsub(“rt”, “”, x)
# remove at
x = gsub(“@\\w+”, “”, x)
# remove punctuation
x = gsub(“[[:punct:]]”, “”, x)
# remove numbers
x = gsub(“[[:digit:]]”, “”, x)
# remove links http
x = gsub(“http\\w+”, “”, x)
# remove tabs
x = gsub(“[ |\t]{2,}”, “”, x)
# remove blank spaces at the beginning
x = gsub(“^ “, “”, x)
# remove blank spaces at the end
x = gsub(” $”, “”, x)
return(x)
}

##apply function clean.text

# clean texts

tata_clean = clean.text(tata_txt)
idea_clean = clean.text(idea_txt)

##Join texts in a vector for each company

tata = paste(tata_clean, collapse=” “)
idea = paste(idea_clean, collapse=” “)

# put everything in a single vector
final= c(tata,idea)
final

##remove stop-words

final = removeWords(all,c(stopwords(“english”),”amazon”,”flipkart”))

# create corpus
corpus = Corpus(VectorSource(final))

# create term-document matrix

tdm = TermDocumentMatrix(corpus)

# convert as matrix
tdm = as.matrix(tdm)

# add column names
colnames(tdm) = c(“tata”, “idea”)

# plot comparison cloud

comparison.cloud(tdm, random.order=FALSE, colors = c(“#00B2FF”, “red”),title.size=1.5, max.words=300)

#pot commonality cloud

commonality.cloud(tdm, random.order=FALSE, colors = brewer.pal(8, “Dark2”),title.size=1.5)

Word comparison cloud:

wordcomparision

Commonality cloud:

commonalitycloud.png

Hope you enjoyed learning it 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s