Sentiment Analysis using NLP libraries.

Social media has opened the floodgates of customer responses and it is now free-flowing in mammoth proportions for businesses to analyze. Today, using Machine Learning and Deep Learning techniques, companies are able to extract these opinions in textual or audio-visual format and then analyze the sentiments behind them on an capital scale. Sentiment analysis, opinion mining call it what you like, if you have a product/service to sell you need to be on it.

“ When captured electronically, customer sentiment — expressions beyond facts, that convey mood, opinion, and emotion — carries immense business value. We’re talking the voice of the customer, and of the prospect, patient, voter, and opinion leader.” — Seth Grimes

Starting from user reviews in media to analyzing stock prices, sentiment analysis has become a ubiquitous tool in almost all industries. For example, the graph below shows the stock price movement of eBay with a sentiment index created based on an analysis of tweets that mention eBay.

Sentiment analysis is a type of data mining that measures the inclination of people’s opinions through natural language processing (NLP), computational linguistics and text analysis, which are used to extract and analyze subjective information from the Web — mostly social media and similar sources. The analyzed data quantifies the general public’s sentiments or reactions toward certain products, people or ideas and reveal the contextual polarity of the information. Sentiment analysis is also known as opinion mining.

There are two broad approaches to sentiment analysis.

Pure statistics:

These kinds of algorithms treat texts as Bags of Words (BOW), where the order of words and as such context is ignored. The original text is filtered down to only the words that are thought to carry sentiment. For this blog, I will be attempting this approach. Such models make no use of understanding of a certain language and only uses statistical measures to classify a text.

A mix of statistics and linguistics:

These algorithms attempt to incorporate grammar principles, various natural language processing techniques and statistics to train the machine to truly ‘understand’ the language.

Sentiment analysis can also be broadly categorized into two kinds, based on the type of output the analysis generates.

Categorical/Polarity — Was that bit of text “positive”, “neutral” or “negative?” In this process, you are trying to label a piece of text as either positive or negative or neutral.

In our approach to sentiment analysis, we undertake both aspects of supervised learning and unsupervised learning in order to reinstate our faiths.

we initially did text normalization on our dataset which was a corpus of 50,000 movie reviews collected and the review ‘Positive” or ‘negative’ associated with it.

The findings of our supervised model were :

Accuracy : 0.90547

For our unsupervised model we tried three latest lexical model of :

AFINN Lexicon

SentiWord Lexicon

VADER Lexicon

the findings were:

AFINN accuracy : 0.71180

SentiWord accuracy : 0.68200

VADER accuracy : 0.71093

From the above results it is clear that our supervised models perform far better than our unsupervised lexical models.