Practical Implementation: Building a Simple Sentiment Classifier
๐ Sentiment Analysis with Scikit-learn
Let's implement a basic sentiment analysis model using Python and scikit-learn.
The goal: classify movie reviews as positive or negative.
๐ ๏ธ Sample Code: Movie Review Classifier
from sklearn.datasets import load_files
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# Load dataset
reviews = load_files('reviews')
X = reviews.data
y = reviews.target
# Preprocess text
vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
X_vectorized = vectorizer.fit_transform(X)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X_vectorized, y, test_size=0.2, random_state=42)
# Train classifier
clf = MultinomialNB()
clf.fit(X_train, y_train)
# Evaluate
predictions = clf.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, predictions):.2f}')
๐ Key Components
- ๐๏ธ Dataset Loading: Loads labeled movie reviews
- ๐งน TF-IDF Vectorization: Converts raw text to numerical features
- ๐ Train/Test Split: Evaluates generalization performance
- ๐ค Multinomial Naive Bayes: A simple, effective text classifier
- ๐ Accuracy Score: Measures classification success
๐ Why It Matters
This pipeline demonstrates how to quickly develop and evaluate an NLP classifier โ a foundational workflow for many sentiment analysis tasks.
๐งฉ Diagram: Sentiment Analysis Pipeline
Movie Reviews
|
v
[TF-IDF Vectorizer]
|
v
[Naive Bayes Classifier]
|
v
Sentiment Prediction
(Positive / Negative)