How to re-index your Elasticsearch data with Laravel Command

Laravel is a very powerful PHP framework that is widely used for web based applications of any size. Personally, I can say that I’ve been using it for years for a lot of the projects I’ve been working on and I must say, it’s a powerful tool.

The first time when one of the projects I’ve been working on (a huge e-commerce platform) needed a very custom search engine which besides the need of a full text search capabilities, me and my team had to do some heavy aggregations and filtering functionalities, I have the chance to get to know with the Elastic stack, especially with Elasticsearch more in depth.

In this article I wouldn’t talk about the power of Laravel and Elasticsearch anymore, that’s not the intend of this post. What I would like to show you is, how to re-index your index data on Elasticsearch using Laravel Command.

While working with Elasticsearch and doing all the researches, you will realize that Elasticsearch is not the best solution to use as a persistent data storage. If all the data stored in Elasticsearch is really important for your system to work, it is recommendable to use something else as a persistent data storage and in my case that was a MySQL database.

So, this mean that you will need to store all the data in MySQL and store the same data again in Elasticsearch which will be used for speed and performance of your search features. Doing all this, soon you will realize that it is very important to have some command line based feature that will be used for re-indexing all the data or some part of your data in a particular index. This is a feature that will be very useful to have for your local development environment and especially in your production environment when you are doing some schema changes for a specific index and you need to re-index your data in the new schema format.

Generating a Laravel Command is a very easy task using the Artisan command:

php artisan make:command ReindexArticles

Executing this command will generate new Command class with a name ReindexArticles. What you will need to do is just changing your boilerplate command class so it can do the re-indexing properly. This should look something like this:

namespace App\Console\Commands;

use Illuminate\Console\Command;

class ReindexAllArticles extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'elasticsearch:reindex-all-articles';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Re-index all articles in elasticsearch';

    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }

    /**
     * Execute the console command.
     *
     * @return mixed
     */
    public function handle()
    {
        // Your re-indexing handler functionality
    }
}

Now we need to edit this boilerplate command class in order to do the data re-indexing. It is always a good practice to abstract your code in order to follow the SOLID principles and to get more maintainable and testable code, so we are going to create a service class that will do this and also we will need to create another class that will be in charge for deleting the articles index before the re-indexing process even start, because if Elasticsearch finds a document by its ID that already exists in that particular index, it will not index it again and that is not what we want in this case. But first of all lets create the IndexArticle class.

namespace App\Services;

use Elasticsearch;

class IndexArticle
{
    public function store($article)
    {
        Elasticsearch::index(
            [
                'index' => 'articles',
                /** Indices created in Elasticsearch 7.0.0 or later no longer accept a _default_ mapping. 
                  * Indices created in 6.x will continue to function as before in Elasticsearch 6.x. 
                  * Types are deprecated in APIs in 7.0, with breaking changes to the index creation, 
                  * put mapping, get mapping, put template, get template and get field mappings APIs. 
                  */
                'type' => 'article',
                'body' => $article,
                'id' => $article['id']
            ]
        );
    }
}

As you can notice, for our indexing functionality I am using the cviebrock Laravel facade which is a wrapper of the official Elastic PHP client. You can use the Elastic PHP client directly or you can even write your own wrapper, but for the sake of simplicity we are going to use this wrapper.

The next step is to create a separate class that will do the articles index deleting before the re-indexing process start, so Elasticsearch can create the articles index again when we’ll give it the first document to index (re-index).

namespace App\Services;

use Elasticsearch;

class DeleteArticlesIndexService
{
    public static function delete()
    {
        return Elasticsearch::indices()->delete(['index' => 'my_index']);
    }
}

We can do the delete method static so we can call it once since the re-indexing command is executed without making an instance of the class. Please be careful when you are using the delete index feature, since it will permanently delete the index that you will specify as a parameter.

And at the end, we can edit our command handle method so it can do its job.

/**
* Execute the console command.
*
* @return mixed
*/
public function handle()
{
    $indexArticleService = app(IndexArticle::class);

    $confirmed = $this->confirm(
        "Are you sure that you want to re-index all of your articles?"
    );

    // Delete articles index if already exists
    DeleteArticlesIndex::delete();

    $this->info("\n<fg=yellow>Indexing all articles. This might take a while...</>\n");
    
    // Select all published articles with cursor
    $articles = Article::where('status', 'published')
        ->cursor();

    $articlesCount = $articles->count();

    $bar = $this->output->createProgressBar($articlesCount);

    foreach ($articles as $article) {
        $indexArticleService->store($article->toArray());
        $bar->advance();
    }

    $bar->finish();
    $this->info("\n <fg=yellow>All articles were indexed!</>");
}

As you can notice above in the handler method, I am using the cursor method for fetching all articles that are published. It is very important that we are considering the load of the data that we might operate with, so using cursor is a good way to conserve memory when retrieving large result sets.

Please be aware that if you use this command in production, it will delete your index and it will need some time while indexing all the data again, depends on the load of data that need to be indexed. So maybe that is not the best solution when your application is in production. About more production oriented solution, I’ll give a try to show a better solution in a future post.