We will design and build a real-time sentiment analysis and hate detection system using Apache Kafka, Elasticsearch and Docker.
This is a project that I made in the “Turn Language into Action, Natural Language Hackathon by Expert.ai”.
I have always been interested in real-time systems and have always wondered how things work under the hood.
So, I found this hackathon to be a perfect opportunity to learn and build something new.
Well then, Let’s ROLL!!!
This is what the complete pipeline looks like. Don’t worry I will cover everything in detail.
But before we move on with the tools and architecture, let me talk about our data sources.
I have used docker to set up all the necessary tools as containers for this project.
Now let’s talk about each component.
For ingesting the real-time data, I have used Apache Kafka.
Now, what is Apache Kafka? Well…
Apache Kafka (Kafka) is an open source, distributed streaming platform that enables (among other things) the development of real-time, event-driven applications. — IBM
Since I have used Python, there is a python client kafka-python available that makes working with Kafka relatively easy.
Using the KafkaProducer, I’ve sent the messages (data from Twitter and NewsAPI) via 2 Kafka topics to the KafkaConsumer. One for the tweets and the other one for the news articles respectively.
KafkaConsumer then calls the Machine Learning service to classify the sentiments of the news media articles and detect hate in the tweets.
Machine Learning service
Expert.ai turns language into data so teams can make better decisions.
Expert.ai Natural Language Understanding
Expert.ai empowers organizations to transform data into knowledge and insight through its deep understanding of complex…
Since I built this project as a part of the Expert.ai hackathon, I have used their API for sentiment analysis/classification and hate detection.
However, you can always use your own Tensorflow or PyTorch model. Also, Huggingface has some very relevant models for sentiment classifications and they are straightforward to set up. You should check them out!
Okay, we have the classified data. Now What?
We have to store that data somewhere to use it for further analytics. I have used Elasticsearch and Kibana to visualize the stored data.
You might ask, why Kibana?
Let me introduce you to the ELK stack.
“ELK” is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch. Kibana lets users visualize data with charts and graphs in Elasticsearch. — Elastic.co
The ELK Stack: From the Creators of Elasticsearch
What is the ELK Stack? The ELK Stack is an acronym for a combination of three widely used open-source projects…
Elasticsearch, Logstash and Kibana go hand in hand in most data engineering or data ingestion use cases. But I have omitted Logstash to keep the pipeline simple and clear to its goal.
But, you can always add Logstash and scale the pipeline further as needed.
That is enough about the ELK stack. Let’s jump into the Elasticsearch design.
Elasticsearch: The Official Distributed Search & Analytics Engine | Elastic
The heart of the free and open Elastic Stack Elasticsearch is a distributed, RESTful search and analytics engine…
Like databases, Elasticsearch has Indexes. These indexes store data defined with certain mappings types. Mapping is more like a schema in other databases.
The mapping describes the fields in the JSON documents along with their data type, as well as how they should be indexed in the indexes.
The above image will give you a better idea about Elasticsearch indexes compared to MySQL or PostgreSQL.
Done with storing the messages/data in the Elasticsearch indexes? Okay, Great! We can finally use that resultant data to visualize and get more insights about the data.
We use Kibana for that.
Kibana: Explore, Visualize, Discover Data | Elastic
Your window into the Elastic Stack Kibana is a free and open user interface that lets you visualize your Elasticsearch…
Kibana is a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack.
This is what my final Kibana dashboard looks like. You can check out the code at my GitHub repo.
⭐ Feel free to leave a star if you like the project.
This part covers only the idea or the overview of the project along with the project architecture. I’ll soon add the coding section in a separate part so stay tuned for that.
That’s all folks. See you soon 👋