News Sentiment Analysis


Understand News Emotions in Real Time

This project transforms raw headlines into insights about how positively or negatively the media portrays current events, enabling quick understanding of the public tone across major outlets.


Collecting and Processing Headlines

News article titles are automatically collected from public XML sitemaps of major outlets. The data is cleaned, tokenized, and prepared for sentiment classification using NLTK and NumPy.

Analyzing Sentiment with DistilBERT

Each title is passed through a pre-trained DistilBERT model fine-tuned on SST-2, classifying the sentiment as positive or negative. Confidence scores are aggregated to measure overall tone per source.

Visualizing Media Mood

Aggregated sentiment data can be visualized to reveal trends in media positivity or negativity over time, helping identify shifts in tone surrounding major events.


Python Libraries

Core dependencies powering the news sentiment analyzer. Includes requests for fetching sitemaps, BeautifulSoup with lxml for XML parsing, Hugging Face transformers for DistilBERT sentiment classification, torch for CUDA-accelerated batch inference, nltk for title tokenization, numpy for score aggregation, and datasets for efficient in-memory data handling during processing.

Sentiment Classification Model

Leverages Hugging Face’s distilbert-base-uncased-finetuned-sst-2-english, a lightweight DistilBERT model fine-tuned on the Stanford Sentiment Treebank (SST-2) for binary sentiment prediction. Outputs “POSITIVE” or “NEGATIVE” labels with confidence scores. Simple pipeline integration via Transformers library enables fast, accurate headline tone analysis without custom training.

Training hyper-parameters: learning_rate = 1e-5 | batch_size = 32 | warmup = 600 | max_seq_length = 128 | num_train_epochs = 3.0 |
Requests

Used for HTTP requests. This is how the titles of news articles are acquired. The URLs, mostly xml sitemaps, are passed in and examined using this library along with BeautifulSoup.

BeautifulSoup

Parses XML sitemaps from news sites. Extracts article titles by navigating XML structure with lxml parser for efficient scraping.

Transformers

Runs sentiment analysis on titles. Uses Hugging Face’s pre-trained distilbert-base-uncased-finetuned-sst-2-english model to classify headlines as POSITIVE or NEGATIVE.

Torch

Enables GPU acceleration. Detects CUDA availability and processes sentiment batches on GPU for faster inference. Ran on PNY 5080 OC.

NLTK

Handles text tokenization. Preprocesses titles by splitting into tokens before feeding into the DistilBERT model.

NumPy

Performs numerical operations. Aggregates sentiment scores and counts positive/negative classifications per news source.

Comments

One response

  1. A WordPress Commenter Avatar

Leave a Reply

Your email address will not be published. Required fields are marked *