Skip to main content

Command Palette

Search for a command to run...

Elasticsearch Explained: A Comprehensive Guide for Beginners

Published
6 min read

Introduction

Imagine you have a massive library with millions of books. Now, picture a team of super-fast librarians who can find any book or piece of information in seconds. That's essentially what Elasticsearch is, but for digital data. Let's dive into how it works!

What is Elasticsearch?

Elasticsearch is a powerful search and analytics engine. It's designed to:

  • Store large amounts of data

  • Search through this data incredibly fast

  • Analyze data and provide insights

Think of it as Google, but for all kinds of data, not just web pages.

Cluster Architecture

An Elasticsearch setup is called a cluster. Here's how it works:

  1. Cluster: The entire system, like a big library network

  2. Nodes: Individual computers in the cluster, like different library branches

  3. Index: A collection of similar documents, like a specific bookshelf

  4. Document: A single unit of information, like a book

Here's a simple diagram to visualize this:

Node Roles

In Elasticsearch, nodes can have different roles, much like librarians with different specialties. Here are the main node roles:

  1. Master Node: The coordinator of the cluster. It manages cluster-wide operations like creating or deleting an index, tracking which nodes are part of the cluster, and allocating shards to nodes.

  2. Data Node: Stores and searches data. These nodes hold the shards that contain indexed documents.

  3. Ingest Node: Pre-processes documents before indexing. Think of this as a librarian who prepares books before they're placed on shelves.

  4. Coordinating Node: Delegates operations to other nodes. It acts as a smart load balancer, distributing searches and aggregations.

  5. Machine Learning Node: Handles machine learning jobs. This is like having a librarian who specializes in pattern recognition and predictions.

A single node can have multiple roles, or you can have dedicated nodes for each role in larger clusters.

How Elasticsearch Organizes Data

1. Indexing

Indexing is how Elasticsearch organizes data. It's like arranging books on a shelf:

  1. Create an index (like choosing a bookshelf)

  2. Add documents to the index (like placing books on the shelf)

  3. Elasticsearch automatically organizes the content for fast searching

2. Shards

Imagine trying to search through a million books on one long shelf. It would take forever! Instead, Elasticsearch uses shards:

  • A shard is a piece of an index

  • Shards spread data across multiple nodes

  • This allows for parallel searching, making it super fast

3. Replicas

Replicas are backup copies of shards:

  • They provide failover (if one shard fails, the replica takes over)

  • They allow more searches to happen at the same time

Searching in Elasticsearch

Searching in Elasticsearch is like asking our super-librarian to find information:

  1. You send a search query

  2. Elasticsearch looks through its indexes

  3. It returns the most relevant results, super fast!

You can search for exact matches, partial matches, or even complex combinations of criteria.

Mapping: Elasticsearch's Catalog System

Mapping is like the library's catalog system. It tells Elasticsearch how to interpret and store different types of data. Let's dive deeper into mapping:

Types of Fields

  1. Text Fields: For full-text content like book descriptions or articles.

    • Example: "title": { "type": "text" }
  2. Keyword Fields: For exact matching on short strings like tags or categories.

    • Example: "genre": { "type": "keyword" }
  3. Numeric Fields: For numbers, including integers and floating-point.

    • Example: "price": { "type": "float" }
  4. Date Fields: For timestamps and date values.

    • Example: "publish_date": { "type": "date" }
  5. Boolean Fields: For true/false values.

    • Example: "in_stock": { "type": "boolean" }
  6. Object Fields: For nested JSON objects.

    • Example: "author": { "type": "object", "properties": { "name": { "type": "text" }, "bio": { "type": "text" } } }

Analyzers

Analyzers process text fields for full-text search. They typically:

  1. Tokenize text (split into words)

  2. Lowercase tokens

  3. Remove stop words

  4. Apply stemming or lemmatization

You can use built-in analyzers or create custom ones.

Example Mapping

Here's an example mapping for a book index:

{
  "mappings": {
    "properties": {
      "title": { 
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "author": { "type": "text" },
      "description": { "type": "text" },
      "publish_date": { "type": "date" },
      "price": { "type": "float" },
      "in_stock": { "type": "boolean" },
      "genres": { "type": "keyword" }
    }
  }
}

This mapping allows for full-text search on title, author, and description, exact matching on genres, and range queries on publish_date and price.

The Elasticsearch Process

Here's what happens when you add and search for data:

  1. Indexing:

    • You send data to Elasticsearch

    • Elasticsearch processes and stores the data

    • The data is spread across shards

  2. Searching:

    • You send a search query

    • Elasticsearch searches across all relevant shards

    • Results are combined and returned to you

Elasticsearch indexing and searching process

Practical Examples

Let's look at some basic Elasticsearch commands:

  1. Creating an index:

     PUT /my_library
     {
       "settings": {
         "number_of_shards": 3,
         "number_of_replicas": 2
       },
       "mappings": {
         "properties": {
           "title": { "type": "text" },
           "author": { "type": "text" },
           "description": { "type": "text" },
           "publish_date": { "type": "date" },
           "price": { "type": "float" },
           "in_stock": { "type": "boolean" },
           "genres": { "type": "keyword" }
         }
       }
     }
    
  2. Adding a document:

     POST /my_library/_doc
     {
       "title": "To Kill a Mockingbird",
       "author": "Harper Lee",
       "description": "A novel about racial injustice in the American South",
       "publish_date": "1960-07-11",
       "price": 9.99,
       "in_stock": true,
       "genres": ["fiction", "classic"]
     }
    
  3. Searching:

     GET /my_library/_search
     {
       "query": {
         "bool": {
           "must": [
             { "match": { "title": "mockingbird" } },
             { "range": { "publish_date": { "lte": "1970-01-01" } } }
           ],
           "filter": [
             { "term": { "genres": "classic" } },
             { "range": { "price": { "lte": 15 } } }
           ]
         }
       }
     }
    

This search query looks for books with "mockingbird" in the title, published before 1970, categorized as "classic", and priced $15 or less.

Conclusion

Elasticsearch makes it possible to store, search, and analyze huge amounts of data quickly. By organizing information into documents, spreading these across multiple shards, and creating smart indexes with detailed mappings, Elasticsearch can find needles in digital haystacks in milliseconds!

Remember, this is just scratching the surface. Elasticsearch has many more powerful features for complex data analysis and visualization. As you get more comfortable with these basics, you can explore its more advanced capabilities like aggregations, geo-spatial searches, and machine learning integrations.

More from this blog

dev ops

25 posts