Hi guys, this is my first post, and any constructive feedback is appreciated!
In this post, I am going to discuss on using ElasticSearch with Python (Using the Flask Framework). This post will primarily focus on creating an ElasticSearch Cluster, and using python Flask APIs to view, load, and retrieve data from the cluster. Before we begin, I would like to give some basic intuition on the ELK (Elastic + Logstash + Kibana) stack, and the Flask framework.
This article will be divided into 2 parts, where we will create an ES Cluster, and then a lightweight Flask application to query the data on the ES Datastore.
The ELK stack:
Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. It is a completely open-source and developed in Java, making it easy to be used across multiple environments, and with many tech stacks. Essentially, ElasticSearch is a NoSQL database, which does not allow you to query the data using SQL. Similar to MongoDB, ES also provides querying APIs which can be used to retrieve data from the datastore with ease. As the heart of the ELK Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.
Data is often scattered or siloed across many systems in many formats. Logstash supports a variety of inputs that pull in events from a multitude of common sources, all at the same time. It allows you to easily ingest from your logs, metrics, web applications, data stores, and various AWS services, all in continuous, streaming fashion.
Kibana is a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack. It helps you to do anything from tracking query load to understanding the way requests flow through your apps. In this article, I will share how to monitor the data on the Kibana Dashboard, and visualizing indices
So let’s get started…
I have hosted all the code mentioned in the article on GitHub
Let’s get a premise of what we plan to achieve:
- Create a Flask application using a python virtual environment
- Get a working ElasticSearch Cluster
- Generate a Kibana Dashboard to visualize the ES data
- Create Python-Flask APIs to query the data (CRUD) operations on the ElasticSearch DataStore
- Some other objectives like extensions for visualizing the data externally, and safety measures to protect your data
To create a python virtual environment, first ensure that the current python version is 3+. Command for the same is:
python --version OR python -V
If the python version is less than 3, then check if python 3 is installed. If yes, then there is a chance that python 2.x is marked as the default python. Rather than changing the default settings, which can be cumbersome for some of you, you can check the python 3.x distribution on your machine using
python3 --version OR python3 -V
If this is appropriate and you have virtualenv installed from pip, then you can proceed to create a VirtualEnv using the command:
virtualenv venv # If the default is python 3ORpython3 -m virtualenv venv # This will explicitly use python 3 distribution to create the virtualenv
Activate the virtualenv by using the following command:
You can ensure that the virtualenv is activated if the virtualenv name is visible in your terminal before the user.
Before we dive into setting up the ELK stack, it is mandatory to install Docker on your machine. Steps for the same can be found at: https://docs.docker.com/engine/install/. Once Docker is installed, navigate to the Settings page, and allocate atleast 4 GB of memory to the application, keeping in mind the image size of ElasticSearch and Kibana.
In case this amount of memory is not allocated, then there is a chance that the ES nodes will fail to start. In case restart is set to always for any container, then the container will keep on crashing and restarting, which can be observed on the Docker Dashboard page.
We will not be installing ElasticSearch on our machine as a standalone application, but rather pull the image from Docker-Elastic. We intend on creating an ElasticSearch Cluster with 3 nodes, to ensure scalability and persistent connectivity. We will define a master node and the other two will be failover nodes.
The resultant docker-compose file is given below:
Now let us understand pieces of the file in depth
es01, es02, es03 are the 3 nodes for the ES Cluster. By default, ES will start on the 9200 port on 127.0.0.1 (localhost). But this can be changed in the above case by replacing the port to which the machine is mapped to the container. As you can see, only the es01 node has the port mapping, and it is the initial master node. It is mapped as the other 2 nodes using the seed_hosts variable.
kib01 is the kibana container, which is hosted on port 5601 on 127.0.0.1 (localhost). Kibana also requires the ES hosts which is mentioned in the ELASTICSEARCH_HOSTS environment variable.
In order to get the containers up and running, use the following steps
- Navigate to the folder where docker-compose file is located.
- Run the following command —
- Ensure that docker-compose is installed in your python virtual environment
docker psto get the list of running containers
- You can also ensure that elastic search is up by navigating to localhost:9200. The resultant screen will be something of this sort:
You can also check if Kibana is up by navigating to localhost:5601. The resultant screen will be something of this sort:
Voila! You have successfully started ElasticSearch and Kibana on your machine. In the next article, we will create a Flask Application which will perform CRUD operations on the ES datastore, including creating, deleting, and modifying indices.
Any suggestions welcome in the comments!