Know the basics of Elasticsearch

Shagun
3 min readAug 11, 2020

When you talk about ELK stack it just means you are talking about Elasticsearch, Logstash, and Kibana. But when you talk about Elastic stack, other components such as Beats, X-Pack are also included with it.

Elasticsearch

It is the Heart of elastic stack. Elasticsearch is a distributed, open-source, RESTful, Highly Scalable search and analytics engine based on the Apache Lucene Library and works for all types of data, including textual, numerical, geospatial, structured, and unstructured.

Commonly used for:

  • Product searching
  • Autocomplete
  • Log mining (Aggregations)
  • Analytics
  • Investigate
  • Visualize
  • Statistics

Cluster

  • It is a collection of one or more servers.
  • It allows searching and indexing across all nodes in the cluster.
  • One node is one Lucene instance.
  • Every cluster is identified by its UNIQUE name. (This is Important for multi-cluster setup)

Node

  • A single server in a cluster called Node.
  • A node has a unique name in the cluster.

Index

  • The index is a collection of documents.
  • The index is an entity as well as an operation in elastic search. (Inserting a document into an index is called Indexing)

Document

A document is a single unit of information. It is in the form of JSON.

Shards

  • Shard is like a partition(piece) of an Index.
  • Shard splits the index horizontally.
  • You can define the number of shards in an index at the time of Index creation.
  • The main shard which is used for write is called as Primary shard.
  • In Eleastisearch, replication is done with the help of Replica shards.

Replica (Replica Shard)

  • Replica contains the same data as its primary shards.
  • The replicas are never allocated to the same node as the primary shard.
  • Allows for fault tolerance.
  • Scales search throughput.

Cluster Status

Your cluster will be either of 3 stats of cluster depends on primary and replica shards.

  • Green, when all the primary, as well as replica shards, are allocated.
  • Yellow, when all the primary shards are allocated where one or more replica shards are unallocated
  • Red, when one or more primary shards are unallocated.

Node Types

Master Eligible Node (Default: True)

  • It is responsible for all the master cluster management, operations like create, update, delete, read as well as tracking of all the clusters and shard allocation.

Data Node (Default: True)

  • Data nodes contain the shards. Index, Delete, Search and other operations are performed on data nodes.

Ingest Node (Default: True)

  • Preprocessing of the data is done by the index node. (Logstash)

Coordinating Only Node (Default: false)

  • Coordinating only nodes acts as a smart load balancer that routes the requests to the nodes.
  • It also handles search reduction.
  • Distributes bulk indexing.

Machine Learning Node

  • It is a feature of X-pack which is not free.
  • In this node, you can run machine learning jobs and API requests.

--

--

Shagun

Cloud Engineer | Developer | AWS | Azure | Microservices