Elasticsearch

Elasticsearch is a search engine based on Lucene with a distributed full-text search engine with an HTTP web interface storing JSON documents.

Links

Getting a test installation via Docker

https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

docker pull docker.elastic.co/elasticsearch/elasticsearch:5.6.3
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:5.6.3

Elastic query

Introducing the query language

Finding Excat values

Your data in Elastic is stored in an so called "index". Which is not the same thing as an index in the database world but more like a table.

In this example the index is called "MY_INDEX"

With this you can search for all documents in the index

GET /MY_INDEX/_search

It starts with "GET" because you only give the URL and no extra payload

In the results you will find an ID field. You can use this ID field to directly access a document like this

GET /MY_INDEX/table/42

This also works for DELETE - get rid of document and PUT to over- write a document. For PUT you need to provide a body with {} where the new content is

PUT /MY_INDEX/table/42
{
 "name": "john"
}

If we want to have more complex filter criteria we need to use POST instead of GET and also provide a body. Normally elastic only return the most relevent documents to you and therefore calculates a score for each document to sort them. If you do not need this you should use constant_score so every document has the same score.

POST /MY_INDEX/_search
{
  "from": 0,
  "size":3,
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "firstName": "John"
              }
            }
          ]
        }
      }
    }
  }
}

So this is the equivalent of

SELECT * from MY_INDEX where firstName="John" and ROWNUM <= 3;

With more than one criteria that needs to be fulfilled (in this example we test with exists if a specific field exists and is non null)

POST /MY_INDEX/_search
{
  "from": 0,
  "size":3,
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "firstName": "John"
              }
            },
            {
              "term": {
                "secondName": "Doe"
              }
            },
            {
              "exists": { "field" : "userId" }
            }
          ]
        }
      }
    }
  }
}

You can also have

  • conditions that are not allowed to be there
  • check if a timestamp is a certain range
  • test if an array contains a value
POST /MY_INDEX/_search
{
  "from": 0,
  "size":3,
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "firstName": "John"
              }
            },
            {
               "script": {
                "script": "doc['myArray'].values.contains('myValue')"
               }
            },
            {
              "range": {
                "mytimestamp": {
                  "gte": "2017-10-22T00:00:00Z",
                  "lte": "2017-10-23T00:00:00Z"
                }
              }
            }
          ],
          "must_not": [
            {
              "term": {
                "secondName": "Doe"
              }
            }
          ]
        }
      }
    }
  }
}

The script keyword opens you a way to use the result programs in different programing languages as a value for your query. But with great power ...

GET /foo/_search
{
  "from": 0,
  "size": 1,
  "query": {
   ...
  },
  "aggs": {
    "foobar": {
      "terms": {
        "script": "def res=[]; def m=params['_source'].productDna; if(m!=null){for(x in m.keySet()){def f=m.get(x); if(f>0){res.add(x);}}} return res",
        "size": 10
      }
    }
  }
}

Or regular expressions (don't work with must but only with should)

POST /MY_INDEX/_search
{
    "from": 0,
    "size": 0,
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "should": [
                        {
                            "regexp": {
                                "firstName": "s22.*"
                            }
                        }
                    ]
                }
            }
        }
    }
}

You can also aggregate results like with GROUP BY in the SQL world.

This gives you 10 values that the field "secondName" has within the search results. None of the actual search results is shown to you (size 0 / size 10)

Aggreations are limited to 10 by default!

POST /MY_INDEX/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "firstName": "John"
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "allSecondNames": {
      "terms": {
        "field": "secondName",
        "size": 10
      }
    }
  }
}

This get the average value of the field timeSpent and also does not show you any document

POST /MY_INDEX/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "firstName": "John"
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "myResult": {
      "avg": {
        "field": "timeSpent"
      }
    }
  }
}

Instead of just getting the average you can also bucket all values of a field, in this example the field timeSpent and bucket size is 500

POST /MY_INDEX/_search
{
  "size" : 0,
  "query" : {
    "bool" : {
      "must" : [
        {
          "term" : {
            "firstName": "John"
          }
        },
        {
          "term" : {
            "secondName": "Doe"
          }
        }
      ]
    }
  },
  "aggs": {
    "myResult": {
      "histogram": {
        "field": "timeSpent",
        "interval" : 500
      }
    }
  }
}

You can also talk directly to the REST endpoint of your Elastic instance via curl, very useful to post process results

 
curl -X POST 'http://127.0.0.1:9200/YOUR_INDEX/_search' -d '{
  "from": 0,
  "size":9999,
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "regexp": {
          "type": "from.*"
        }
      }
    }
  }
}'
| grep _id | sed 's/_id/\n_id/g' | grep ^_id | sed s/\",\".*// | sed s/^.*\"//

Show Delete Entry

You need the type and the id of an entry to show or delete it (which you can get from a query result) So assume this is the result from a query

 
{
  ...,
  "_type": "YOUR_TYPE",
  "_id": "YOUR_ID",
  ...
}

Try this (in GET mode)

 
/YOUR_INDEX/YOUR_TYPE/YOUR_ID

Or via curl

 
curl -X GET 'http://127.0.0.1:9200/YOUR_INDEX/YOUR_TYPE/YOUR_ID'

For deletion just replace by delete

Maintanance

Get status

GET /_cat/nodes

GET _cluster/health?pretty

GET _cluster/allocation/explain

GET _cluster/settings

GET /_cluster/settings?pretty&include_defaults

Exclude node from cluster

Exclude one node and allow all nodes again

 
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": "10.20.30.40"
  }
}

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": ""
  }
}

Templates

Elastic uses templates when it creates an index. This lists all templates and prints one

GET /_cat/templates

GET /_template/my_template_name