Re-index your Elasticsearch data with zero downtime

As I promised in my previous blog post, today I will explain how to re-index your Elasticsearch index data with zero downtime.

As I elaborated in the previous blog post, using Elasticsearch as your primary data storage is probably not the best solution, so you should keep your data using more persistent friendly database system like MySQL or PostgreSQL. For more info about this please check my previous blog post.

The solution I presented is a great solution when you need to re-index your data from a particular Elasticsearch index (or multiple indices) on your local or testing environment, but this solution will cause data usage inconsistency while the process of the re-indexing is not finished. The problem is that we need to delete the articles index (the index used as an example in my previous post) and we need to create it again and re-index the content, document by document. While this process is still in progress, the data is not completely indexed and if someone request a document that is still not indexed, it might not get it.

This could be a common production mode issue, so Elastic has a very elegant solution for this. Instead of using the index name in your code when indexing or searching for a document in a particular index, you should use index aliases. This means that we can use a single alias for searching multiple indices, or we can assume that we are not dependent from the index name when we are querying and aggregating data from a specific index or indices.

As I already started this explanation in my previous article, I will use Laravel Commands for the re-indexing process through command line, but the concept is basically the same for any programming language or technology.

First let’s recall how a Laravel command looks like:

namespace App\Console\Commands;
 
use Illuminate\Console\Command;
 
class ReindexAllArticles extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'elasticsearch:reindex-all-articles';
 
    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Re-index all articles in elasticsearch';
 
    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }
 
    /**
     * Execute the console command.
     *
     * @return mixed
     */
    public function handle()
    {
        // Create new articles index

        // Re-index articles data in the new index

        // Remove the alias from the previous indices and
        // set the alias to the newly created index
    }
}

As you can see from the comments in the handle method, I tried to describe the actions we need to do sequentially.

Create new index

First of all we need to create the new index. As you might guess, we need to find a unique way of giving index names now, since all the indices will keep articles in them, but we need to keep the keyword “articles” for the index alias. We can produce unique index name by adding the current date and time till seconds so we can be sure that the index name will never be duplicated. You can do that as bellow:

namespace App\Services;

use Elasticsearch;

class ManageIndex
{
    /**
     * Creates Elasticsearch index
     *
     * @param string $index_name Name of index
     * @param string $index_type_name Name of index type
     * @param array $index_mappings Index mappings
     * @param array $settings Mapping settings
     * @return void
     */
    public static function create(string $index_name, string $index_type_name, array $index_mappings, array $settings): void
    {
        $body = [];
        $body['mappings'] = [$index_type_name => $index_mappings];
        $body['settings'] = $settings;

        Elasticsearch::indices()->create([
            'index' => $index_name,
            'body' => $body,
        ]);
    }
}

As we know it’s always a good thing to abstract our code for maintaining and reusing purposes, so I already made some basic abstraction of the index creating functionality in a separate class. You can adjust this to your needs.

Re-index the articles data in the new index

Second, we need to re-index the articles data in the new index. This is explained in details in my previous blog post.

Set the alias to the newly created index

The next thing we should do, is to remove the alias from the previous indices and set the alias to the newly created index. We can abstract this adding a separate method in the previously created class Manage.

/**
 * Set index alias
 *
 * @param string $index
 *
 * @param string $alias
 */
public static function setIndexAlias(string $index, string $alias): void
{
    $params['body'] = [
        "actions" => [
            [
                "remove" => [
                    "index" => "*",
                    "alias" => $alias
                ]
            ],
            [
                "add" => [
                    "index" => $index,
                    "alias" => $alias
                ]
            ]
        ]
    ];

    Elasticsearch::indices()->updateAliases($params);
}

Delete old article indices

The last thing to do is to delete the old article indices that you do not use anymore. You can do that manually through Kibana, you can do that through command line or you can do that as an additional laravel command. Just keep in mind that maybe the best approach will be using Kibana manually so you can be sure what you are doing, especially when you managing data in production. Kibana is a powerful product from Elastic made for visualizing your Elasticsearch data.

Now we can use all the above methods we’ve wrote in our Laravel Command like this:

namespace App\Console\Commands;
 
use Illuminate\Console\Command;
use App\Services\ManageIndex;
 
class ReindexAllArticles extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'elasticsearch:reindex-all-articles';
 
    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Re-index all articles in elasticsearch';
 
    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }
 
    /**
     * Execute the console command.
     *
     * @return mixed
     */
    public function handle()
    {
        $indexName = $articles.'-'. date("Y-m-d-H-i-s");
        $indexMappings = [
            "properties" => [
                "title" => [
                    "type" => "text"
                ],
                "content" => [
                    "type" => "text"
                ]
            ]
        ];
        $indexSettings = [
            "number_of_shards" => 3,
            "number_of_replicas" => 1
        ];

        // Create new articles index
        ManageIndex::create($indexName, 'article', $indexMappings, $indexSettings);

        // Re-index articles data in the new index
        // This part is explained in details in my previous blog post

        // Remove the alias from the previous indices and
        // set the alias to the newly created index
        ManageIndex::setIndexAlias($indexName, 'articles');
    }
}

Important notes

Few notes at the end. Please be aware that indices created in Elasticsearch 7.0.0 or later no longer accept a default mapping. Indices created in 6.x will continue to function as before in Elasticsearch 6.x. Types are deprecated in APIs in 7.0, with breaking changes to the index creation, put mapping, get mapping, put template, get template and get field mappings APIs.

If you like to know more about index mappings you can check the official Elasticsearch documentation for index mappings and you can also find more about index settings in the official documentation.