Vineel Rayapati
Data Scientist
Media Framing Analysis with Nave Bayes
Overview
Based on Bayes' Theorem, Naïve Bayes is a probabilistic classification algorithm that is quick and effective. Although it is simplified, the assumption that one feature in a class exists independently of other features is effective in many real-world situations, particularly text classification.
In this project, news headlines and article descriptions about removing smartphone chargers are classified using Naïve Bayes into three sentiment categories: pro, neutral, and against. These categories show whether the article presents the policy objectively, criticizes it as a cost-cutting measure, or supports it as a sustainable step.
We can determine which sentiment dominates coverage across sources and how media narratives are framed by using Naïve Bayes for this task. This advances the project's overarching objective of comprehending corporate communication patterns and public opinion regarding environmentally framed policies in the tech sector.
Data Preparation
The first step in successfully using Naïve Bayes was to format our dataset for supervised learning. Two essential elements of this procedure were labeled data and a distinct train-test split.
Online news article headlines and descriptions that have been manually classified into one of three sentiment categories ( Pro, Neutral, or Against ) make up our dataset. These labels reflect the article's position on policies pertaining to the removal of smartphone chargers. A tiny sample of the labeled data we used is as follows:
Lemmatized_Title Label
apple remove charger iPhone launch Against
eco friendly step fewer accessories Pro
users split reaction charger removal Neutral
To train and evaluate the model, we split the labeled dataset into two disjoint subsets:
-
Training Set (80%) – Used to build and fit the model
-
Testing Set (20%) – Used to evaluate the model’s generalization on unseen data
We used stratified sampling to ensure that all three classes were proportionally represented in both sets. This is important for fair model evaluation, especially since Neutral articles were more frequent in the dataset. Here's a visual snapshot of the class distribution before and after splitting:




In supervised learning, disjoint sets are crucial. The model might just memorize the answers if the same data points show up in both the training and testing sets. This would result in inflated accuracy and subpar performance in the real world. We make sure that the model is actually learning the underlying patterns rather than merely memorizing examples by maintaining complete separation between the sets.
We can fairly test Naïve Bayes' ability to classify real-world, previously unseen articles thanks to this meticulous preparation.
Code
To turn the raw text into numerical features appropriate for supervised learning, the text data was first transformed using CountVectorizer and then subjected to TF-IDF transformation.
The MultinomialNB classifier, which works well for text classification tasks involving word count-based features, was used. The pipeline guarantees that the model training and preprocessing procedures are carried out consecutively.
Results & Interpretation
Standard performance metrics, such as accuracy, precision, recall, and F1-score, were used to assess the Naïve Bayes model on the testing set. With an overall accuracy of 47%, the classifier proved to be a reliable baseline model for this project's sentiment classification.
With a recall of 74%, the model did best on the Neutral class, correctly identifying the majority of Neutral articles. However, because of the substantial lexical overlap between critical and supportive language in the dataset, it had trouble accurately classifying articles that were Against and those that were Pro. Simpler models that assume feature independence, such as Naïve Bayes, are likely to have this limitation.
Below are the two key visualizations generated from the evaluation:
Confusion Matrix :
The number of accurate and inaccurate predictions for each of the three classes is displayed in this matrix. The model struggles to differentiate between strongly opinionated tones, as evidenced by the majority of misclassifications between Pro and Against.

Classification Report :
The precision, recall, and F1-score for every class are compiled in this report. The Neutral class performed the best, as predicted, and the Against class had the lowest recall (just 14%).

These findings imply that Naïve Bayes may oversimplify when identifying more complex positive or negative tones, even though it performs well when handling neutral or balanced sentiment. Nevertheless, it provides a quick and understandable way to start text mining sentiment classification.
Conclusion
For sentiment analysis of news articles about removing smartphone chargers, the Naïve Bayes classifier offered a robust and comprehensible starting point. It was most successful in identifying neutral articles, indicating that a large number of the dataset's headlines and descriptions are written in an impartial or non-opinionated manner.
Due to the vocabulary overlap between positive and critical narratives, the model was unable to distinguish between pro and against sentiments. This draws attention to one of Naïve Bayes' main drawbacks: its presumption of feature independence may cause it to ignore contextual nuances that are crucial for identifying opinion in text.
Nevertheless, Naïve Bayes contributed to the development of fundamental understanding of the subject's media discourse. It showed that neutral coverage predominates and that basic models can be used to automate sentiment classification with a respectable level of accuracy. These results are consistent with our overarching research objective of comprehending media tone and public framing of sustainability policies in the tech sector.
More sophisticated classifiers, such as SVMs or Decision Trees, may be used in future modeling to enhance prediction for minority classes like Pro and Against and better capture nuanced sentiment differences.