Elastic Search Tutorial: Deploy, Scailing, and Main Functionality

In this article, we will review the main functionality that can cover at least 95% of the needs of a typical project. If you are new to ElasticSearch, in this post you will find answers to almost all questions, that you should ask before using a new database.

elastic search tutorial logo

Understanding ElasticSearch

Elasticsearch is an open-source and full-text search engine at the core of Elastic Stack operations. It stores JSON documents in a schema-free database based on Apache Lucene software and licensed under Apache 2.0. With Elasticsearch, you can work with multiple data variations, including structured, unstructured, text-based, geospatial, and numerical. Real-time indexing makes Elasticsearch facilitate quick searches for all types of data. The engine integrates well with other tools such as Java API

Deploying Elasticsearch

There are various ways to deploy Elasticsearch and these include:

  • Elasticsearh Service (Available on Alibaba Cloud, Google Cloud Platform(GCP) and Amazon Web Services(AWS))
  • Installation from your hardware after downloading
  • Cloud installation

Pulling Images

To pull an image in the current version, here is an example:

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.12.1

 Running images in Development

For development purposes you can run your image as follows:-

docker run -p 9200:9200 -p 9300:9300 -e “discovery.type=single-node” docker.elastic.co/elasticsearch/elasticsearch:7.12.1

How Elasticsearch Works

Elasticsearch will store your documents in indexes which are tables that contain documents in rows. The software’s schema-free index allows you to put documents containing structural variations as long as you observe key limits (which we will explain further below) when mapping.

A brief overview of how Elasticsearch works:

  • On document insertion, Elasticsearch divides field values into tokens and feeds them into the inverted index. For instance, when you are looking at a sentence, every word can represent a different token.
  • When searching using a phrase, Elasticsearch will divide the phrase into tokens that match the inverted index.

Here you can read more about the inverted index for in-depth understanding or have a look at the software’s official guide

Scaling Approaches

Elasticsearch works with clusters, where each computer connected to a larger network can host single or multiple shards. Elasticsearch ensures all operations fall under the correct shard. Scaling using Elasticsearch is achieved through partitioning and replication.

Partitions

Elasticsearch index is partitioned into single or multiple shards. Data indexed by users is stored on one of the cluster’s shards.

Replicas

Elastic shard comprises a primary shard and one or multiple replica shards. All data input in the index is copied to both the primary and replica shards. A replica shard is identical to the primary shard. When the node that bears the primary shard is out, the replica serves as a replacement.

elastic search index

Indexing

Elasticsearch has a flexible index, and you do not have to use a specific schema. Yet, you can create a precise schema for excellent search analytics.

Creating an index:

curl -X PUT http://localhost:9200/person

With this command, the user creates an index that is schema-free and has the name “person.” If the index is already in use, an error displays to notify the user.

A Look at the Index Information

This is the response you will get when you try to review the information contained in the index. The settings, names, and mapping are visible. If the index is nonexistent, you will see an error message.

curl -X GET http://localhost:9200/person

Mapping

Mapping or schema refers to the nature of storage for documents and their fields in the index. When mapping, here are a few actions you specify:

  • Document structure-determining your fields and the type of data the fields will contain
  • Value transformation-before indexing
  • Fields to use in full-text searches

Custom Mapping

When you need to create a custom mapping for your index, you can follow the steps below:

curl -X PUT http://localhost:9200/person \

-H ‘Content-Type: application/json’ \

-d ‘{

  “mappings”: {

       “dynamic”: “strict”,

       “properties”: {

         “name”: {“type”: “text”},

         “email”: {“type”: “text”},

         “location”: {“type”: “geo_shape”},

         “extra_data”: {“type”: “object”, “dynamic”: true}

       }

  }

}’

As evident above, the user creates document mapping through a static root structure. One field features a dynamic object comprising any number of fields. Index setting will limit the number of available keys.

Viewing the Mapping

curl -X GET http://localhost:9200/person

The API returns the mapping from the previous illustration.

Types of Data Supported

Elasticsearch has excellent versatility when it comes to data variations. Here are the most popular data types for best search results.

  • Specific data like IPs and geo shapes.
  • Complex data, for example nested (linked list(array)) and object (hashmap(dictionary))
  • Core data: strings (keywords and text), binary, boolean, numbers (float, integer) etcetera

Every type of data supported in the platform has a specific destination and settings. You can get all the information you need for every kind of data here.

String data keywords and text serve different purposes. “Text” is applicable for full-text searches, while keywords come in handy when sorting, making aggregations, and direct matching.

Feeding Data into an Index

Data insertion is possible in several ways:

Single Documents

Here is how to insert data in a single document:

curl -X POST http://localhost:9200/person/_doc \

-H ‘Content-Type: application/json’ \

-d ‘{“name”: “John”, “age”: 30}’

If your insertion is successful, this request will return the ID that was generated. You can insert a specific id.

curl -X POST http://localhost:9200/person/_doc/id-1 \

-H ‘Content-Type: application/json’ \

-d ‘{“name”: “Katrin”, “age”: 25}’

Bulk Insertion into a Single Index

curl -X POST http://localhost:9200/person/_doc/_bulk \

-H ‘Content-Type: application/json’ \

-d ‘{ “index”:{} }

{ “name”:”Alex”,”age”:25 }

{ “index”:{} }

{ “key1”:”Amely”,”age”:27 }

Remember to end any bulk addition with a new line.

Bulk Insertion into Multiple Indexes

curl -X POST http://localhost:9200/_bulk \

-H ‘Content-Type: application/json’ \

-d ‘{ “index”:{“_index”: “person”} }

{ “name”:”Jack”,”age”: 34 }

{ “index”:{“_index”: “person”} }

{ “name”:”Oscar”,”age”:22 }

{ “index”:{“_index”: “person”} }

{ “name”:”John”,”age”:27 }

Identifying the Document

To know the type of document, take a look at the insertion URI.  The “/ _doc/” part refers to the type of document.

Updating Document Fields

To update a field in a document, here is an example of the process:

curl -X POST http://localhost:9200/person/_update/id-1 \

-H ‘Content-Type: application/json’ \

-d ‘{“age”: 24}’

You can carry out more complex searches on Elasticsearch and here is all the information you will need when making updates.

elastic search search queries

Popular Search Queries

To get the most out of Elasticsearch, you will need to have a clear understanding of all supported queries. Standard projects usually employ the searches highlighted below:

Match All Fields to Text

 curl -X GET http://localhost:9200/person/_search?q=john

This is the query to use when you are looking for tokens in any field.

Match All

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

   “query”: {“match_all”: {}}

}’

This will return all documents included in the ID.

Multi Match

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

   “query”: {

     “multi_match”: {

       “query”: “John”,

       “fields”: [“name”, “age”],

       “fuzzines”: 3,

   }

}

}’

Multi-match allows you to search for one token in specific fields. The “fuzziness” parameter is included to enable results with typos to appear.

Match Phrase

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

 “query”: {

   “match_phrase”: {

      “name”: “John Snow”

   }

 }

}’

This query works when you need results for phrases that are complete. In the above token, John and Snow will appear together as a phrase. If they do not appear as indicated, there will be no results.

Match One

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

 “query”: {

   “match”: {                 

     “name”: “John Snow”

   }

  }

}’

“Match” works for specific tokens in a specified field. In this example, we have two tokens in the name field. The search will reveal Documents whose field names contain “John” or “Snow” as individual tokens or both.

Fuzzy

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

“query”: {

   “fuzzy”: {

     “name”: {

       “value”: “Jahn”

  }

   }

}

}’

Documents with a similar search value will appear. This includes typos, as illustrated in the example.

Term

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

“query”: {

   “fuzzy”: {

     “name”: {

       “value”: “Jahn”

  }

   }

}

}’

The term query yields all documents with an exact value in a particular field. In this case, the “name” field should be precise 'John'.

Viewing the Response from a Search Query

Sometimes you may not wish to copy-paste your queries. In this case, you can expect a standard response to appear as illustrated.

elastic search typical response

[Typical Elasticsearch response body]

Your response yields an object from the field “hits” comprising all matching data in the “hits” inner key. This key also returns “max_score” and “total” that contain information showing all matching records and the max score in the results. Every document indicates the “source,” highlighting the data fields such as the scoring, ID, type, and index.

Results Ranking

Result Scores

Elasticsearch has a scoring system that qualifies field matches based on the specified query and other additional configurations applied during the search. You can create specific ranking functions with Elasticsearch. To find out how Elasticsearch scoring works, you can visit this page.

Boosting Scores

While doing a search, you might value results from some fields more than others. In this case, boosting the score for these fields will give you a higher total score. You can calculate your total score by multiplying the boost factor in the field score.

You can input your boost in the query directly. For instance, in a “multi_match” search, you can include a x2 and a x5 boost factor to two keys as illustrated in the format below:

“fields”: [“name^2”, “age^5”]

You can include a key boost in the search query’s object for other queries as demonstrated:

“fuzzy”: {“name”: {“value”: “John”, “boost”: 2}} 

Sorting Results

You cannot sort results by scores but can group the results by any field. Here is an example to guide you on result sorting.

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

 “sort”: [

   {“age”: {“order”: “asc”}},

   {“name”: {“order”: “desc”}},

 ],

 “query”: {“match_all”: {}}

}’

Boolean Queries

Also known as “AND-OR-NOT,” Boolean queries make it possible to receive all your relevant results. These queries facilitate the aggregation of different terms under a single query. Also, you can exclude some results or conjoin others to your query.

Must

In a query, “MUST” serves the same function as AND. This implies that all queries must appear in the resulting document.

Must_not

“Must_not” performs similarly to “NOT.” This means that any documents specified under this condition must not appear in any of the results.

Filter

“Filter” functions like” MUST/AND” but has no effect on the score results.

Should

Should queries work like OR. For example, where there are two set terms under “should,” expect all documents that fall under either of the terms to appear. This is a relevant consideration in scores. A document that meets only one term will have lower scores than those that meet all specified conditions.

Setting Search Limits

When making a query, you will need to set some limits or offsets to receive refined results from the database. For instance, you might want to skip some documents and opt to see others in a particular range. If you want to skip 5 documents and show the consecutive 10 documents, you can make this query:

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

  “from” : 5,

  “size” : 10,

  “query”: {

“match_all”: {}

  }

}’

Analysis

Elasticsearch always performs analysis anytime a user wants to input any document into an index. Through this analysis, the text is converted into tokens that are incorporated into the inverted index for searches. Elasticsearch analyzes data through a custom or in-built analyzer. Each index is analyzed by default in Elasticsearch. Analyzers perform various functions, including removing stopwords, removing special chars, and replacing emojis with text.

Data Aggregations

Data search and aggregation is a demanding and laborious process. Elasticsearch provides an excellent aggregation platform when you need to aggregate data from search queries. Aggregation makes it possible to analyze an index and produce unique results such as checking minimum, average and maximum values in a data range. Here is all the information you need on aggregation with Elasticsearch. You can have multiple aggregations in a single query. Here is a simple illustration of query aggregation:

curl -X GET http://localhost:9200/person/_search \

-H ‘Content-Type: application/json’ \

-d ‘{

 “aggs” : {

   “avg_age_key” : { “avg” : { “field” : “age” } },

   “max_age_key” : { “max” : { “field” : “age” } },

   “min_age_key” : { “min” : { “field” : “age” } },

   “uniq_age_key” : { “terms” : { “field” : “age” } }

 }

}’

As evident above, the single query consists of four types of commands to aggregate the results. Aggregation results are visible under the “aggregations” key at the bottom.

Map Reducing

With aggregation, you can create helpful analytics for indexing purposes. Data analytics involving massive data volumes can be complex without a powerful engine like Elasticsearch. You can MapReduce jobs and get more valuable analytics with Elasticsearch.

Transactions

The Elasticsearch framework does not support any type of financial transaction.

Elasticsearch Applications

Elasticsearch benefits and features make the engine a helpful tool for the functions summarized below:

Elasticsearch’s power and ability to process real-time results are ideal for critical business and operational needs. Complex searches are much simpler, as the app supports a wide selection of data, including unstructured data and numerical. Here are a few ways companies use Elasticsearch in their operations: -

E-Commerce Search

Modern business operates in a highly competitive environment. Potential customers are on the lookout for specific products and have little patience. With Elasticsearch, firms can generate detailed product lists to appear on client search results instantly. Fast results can lead to better conversions and faster growth.

Perfecting User Engagement

Many internet users do their searches on the go and prefer sites that address their needs at a glance. One of the most popular social sites, Facebook, uses Elasticsearch. With millions of daily searches, there is more than enough data on user preference to provide content that meets user needs.

Instant Query Replies

One of the most challenging aspects of running any venture is the need to respond to client queries within the shortest time frame. Elasticsearch is a reliable tool to run live chat programs as it is fast and capable of displaying the number of messages you would prefer from your chat history with clients.

Business Analysis

Most businesses have an online presence, making it necessary to have relevant information to help propel the company towards success. Businesses that use Elasticsearch can have a clearer picture of purchasing patterns and customize their store to client preferences. Satisfied clients are highly likely to come for repeat purchases, boosting revenue.

Enhancing Security

Accessing data on a real-time basis is essential when detecting fraud. Companies can stop fraudulent activity as early on as possible. Banks use Elasticsearch to detect fraud and send SMS alerts of suspicious transactions to their customers.

Mining Public Data

The amount of data open to the public is massive. With the right tool, you can obtain valuable insights from public sources such as social media pages. Elasticsearch can help firms get specific goals for their marketing strategies through refined social approaches.

 
Ihor Kopaniev

CTO at The APP Solutions, with niche expertise in AI and ML. https://www.linkedin.com/in/kopaniev/