Understand News Emotions in Real Time
This project transforms raw headlines into insights about how positively or negatively the media portrays current events, enabling quick understanding of the public tone across major outlets.

Collecting and Processing Headlines
News article titles are automatically collected from public XML sitemaps of major outlets. The data is cleaned, tokenized, and prepared for sentiment classification using NLTK and NumPy.
Analyzing Sentiment with DistilBERT
Each title is passed through a pre-trained DistilBERT model fine-tuned on SST-2, classifying the sentiment as positive or negative. Confidence scores are aggregated to measure overall tone per source.
Visualizing Media Mood
Aggregated sentiment data can be visualized to reveal trends in media positivity or negativity over time, helping identify shifts in tone surrounding major events.
Python Libraries
Core dependencies powering the news sentiment analyzer. Includes requests for fetching sitemaps, BeautifulSoup with lxml for XML parsing, Hugging Face transformers for DistilBERT sentiment classification, torch for CUDA-accelerated batch inference, nltk for title tokenization, numpy for score aggregation, and datasets for efficient in-memory data handling during processing.
Sentiment Classification Model
Leverages Hugging Face’s distilbert-base-uncased-finetuned-sst-2-english, a lightweight DistilBERT model fine-tuned on the Stanford Sentiment Treebank (SST-2) for binary sentiment prediction. Outputs “POSITIVE” or “NEGATIVE” labels with confidence scores. Simple pipeline integration via Transformers library enables fast, accurate headline tone analysis without custom training.
Requests
Used for HTTP requests. This is how the titles of news articles are acquired. The URLs, mostly xml sitemaps, are passed in and examined using this library along with BeautifulSoup.
BeautifulSoup
Parses XML sitemaps from news sites. Extracts article titles by navigating XML structure with lxml parser for efficient scraping.
Transformers
Runs sentiment analysis on titles. Uses Hugging Face’s pre-trained distilbert-base-uncased-finetuned-sst-2-english model to classify headlines as POSITIVE or NEGATIVE.
Torch
Enables GPU acceleration. Detects CUDA availability and processes sentiment batches on GPU for faster inference. Ran on PNY 5080 OC.
NLTK
Handles text tokenization. Preprocesses titles by splitting into tokens before feeding into the DistilBERT model.
NumPy
Performs numerical operations. Aggregates sentiment scores and counts positive/negative classifications per news source.

Leave a Reply