Splitting Opinions: A Decision Tree Analysis of Media Sentiment

Overview

A supervised learning algorithm called a decision tree models decisions as a sequence of hierarchical binary splits. Finding the most important characteristics in the data and constructing comprehensible routes from the root to every prediction is how they operate. Because of this, they are particularly useful when interpretability and transparency are top concerns.

In this project, we used a Decision Tree classifier to forecast the tone of news stories concerning the removal of smartphone chargers. Three sentiment labels ( Pro, Neutral, and Against ) were applied to the articles. This approach provides insight into the structure of media tone by assisting us in not only making predictions but also visually comprehending the words or phrases that influenced classification decisions.

Data Preparation

We used the same labeled dataset of news headlines and descriptions about removing smartphone chargers for the Decision Tree classifier. Every article had a pro, neutral, or against annotation. We used the TF-IDF transformation to turn the preprocessed text into numerical feature vectors prior to model training, highlighting the key terms in each article.

After that, the data was divided into two disjoint sets:

The tree structure is grown and optimized using the 80% Training Set.
20% Testing Set: Used to assess how well the model generalizes to new data.

To guarantee that every sentiment category was fairly represented in both sets, stratified sampling was employed. To prevent training a biased model that benefits the dominant class, this is essential. A sample of the converted dataset and a graphic comparison of the class distribution are shown below:

Screenshot 2025-04-22 at 12.44.43 AM.png

Screenshot 2025-04-22 at 12.45.28 AM.png

Screenshot 2025-04-22 at 12.45.04 AM.png

Screenshot 2025-04-22 at 12.46.19 AM.png

The Decision Tree was able to identify the most significant splits based on significant sentiment indicators because of the preparation that made sure it was trained on clean, structured data. In order to prevent overfitting and maintain the interpretability of the model, we also restricted the tree depth.

Code

Python and the Scikit-learn library were used to implement the Decision Tree classifier. There were two crucial steps in the pipeline:

TF-IDF is used to transform the count vectorized features, and a DecisionTreeClassifier is trained with a controlled depth to guarantee interpretability.
To assess performance, the model was fitted to the training set and predictions were produced for the testing set. To avoid overfitting and still capture enough complexity to divide the data in a useful way, we set the maximum depth at 5.
Furthermore, we displayed the top three tiers of the tree, demonstrating the impact of specific keywords such as "removing," "screen," "iphone," and "authority" in directing the model toward a sentiment classification. This graphic makes our findings more explicable and clarifies the reasoning behind each prediction.

Code

Results & Interpretation

The Decision Tree classifier was the best-performing model in this project thus far, with an overall accuracy of 51%. With a 96% recall rate, it demonstrated a particularly high ability to identify neutral sentiment, correctly classifying almost all of the neutral articles in the test set.

But like the Naïve Bayes model, it had trouble telling the difference between articles that were pro and those that were against, especially the latter, which had a recall of only 7%. This shows that although the model does a good job of detecting balanced coverage, it struggles to understand more opinionated language, which may call for more intricate or sophisticated model structures.

Confusion Matrix:
The predicted classes and actual labels are compared in the matrix below. Even when the actual label was Pro or Against, the model most frequently predicted Neutral.

Classification Report :

The precision, recall, and F1-score for every class are compiled in this report.

Screenshot 2025-04-22 at 12.52.28 PM.png

Tree Visualization :

Certain TF-IDF-weighted terms, such as "removing," "iphone," and "authority," had a significant impact on decision splits, according to the tree structure. In addition to offering transparency, this aids in our comprehension of the main forces influencing the way the media presents the issue of charger removal.

Conclusion

The Decision Tree model was a useful tool for assessing sentiment in media coverage of smartphone charger removal because it offered both interpretability and predictive power. With a near-perfect recall, it was especially good at detecting neutral content and had the highest accuracy of any model tested to date (51%).

Transparency is one of Decision Trees' greatest advantages. We were able to track how specific keywords affected the sentiment classification by visualizing the model. This interpretability gives our analysis more legitimacy and sheds light on how the media is framing the subject.

Nevertheless, the model's ability to identify content with strong opinions was limited. It appears that more complex language patterns are more difficult to capture with shallow trees because both pro and against sentiments were commonly misclassified as neutral. This could be improved by deepening the tree, but overfitting could also result.

All things considered, the Decision Tree provided us with a good balance between interpretability and performance. It illustrated the potential of employing explainable AI models for text sentiment classification in socially relevant topics and assisted in identifying important terms that contribute to media bias.

Github