Hi there! Today we will understand one of the most interesting and useful aspects of Apache Solr engine. In most real world applications, data availability is a key aspect that organisations around the globe have to account for in order to stay current with the customer’s demands. Solr, in particular, is extremely potent when it comes to data management and data availability. It’s not far fetched to assume that a very special care is taken as to when the data is available after indexing in solr engine.
To facilitate this aspect, solr has a concept called “Near Real Time Search”,or, more popularly referred to as NRT. Essentially, it tells how long after indexing data is available to search on applications. ‘Near’ in NRT is configurable in this regard. Also, it’s one of the main features of solrCloud and is rarely attempted in master-slave configurations.
Now, let’s understand a related concept of “commits”. Document durability and search-ability is controlled by commits
. Commits are either “hard” or “soft” and can be issued by a client , via a REST call or configured to occur automatically in solrconfig.xml
. Typically in NRT applications, hard commits are configured with openSearcher=false
, and soft commits are configured to make documents visible for search.When a commit occurs, various background tasks are initiated, however, these background tasks do not block additional updates to the index nor do they delay the availability of the documents for search.
Where Use Near Real Time (NRT) Search?
Near Real Time (NRT) Search is essentially used in all business applications, small or large. For Example:
- E-commerce applications
- Large scale BFSI sector applications
- Applications with structured/unstructured data back-end
- Applications handling Legal services Data
- Applications with insurance data
- Applications pertaining to citizen data, etc.
Commits and Searching
We commit documents to solr index in two ways: Hard Commit and Soft Commit. Hard commit, as the name suggests, flushes/dumps all changes made since last commit to the index in the hard drive. Soft commit, on the other hand, is faster as it does not commit changes in hard drive but only makes the changes available for search.
Both hard and soft commits have two primary configuration parameters: maxDocs
and maxTime
.
Soft commit makes use of two parameters: maxDocs
and maxTime
.
Parameter | Description |
---|---|
maxDocs |
Integer. Defines the number of documents to queue before pushing them to the index. It works in conjunction with the update_handler_autosoftcommit_max_time parameter in that if either limit is reached, the documents will be pushed to the index. |
maxTime |
The number of milliseconds to wait before pushing documents to the index. It works in conjunction with the update_handler_autosoftcommit_max_docs parameter in that if either limit is reached, the documents will be pushed to the index. |
Another important aspect in this discussion is Transaction logs(tlog). In essence, if enabled, a tlog is created after every hard commit and acts as a storehouse of the updates rolled out after the last hard commit. tlogs are important as they prevent data loss to occur. To facilitate this, all index calls are turned to tlogs before the clients, hence, if solr crashes, upon restarting, all these messages are replayed preventing any kind of data loss.
Configuring Commits
It is usually preferable to configure commits (both hard and soft) in solrconfig.xml
and avoid sending commits from an external source. Use steps below to configure hard/soft commit values as needed.
Step -1 : Fire up a Solr instance –
Use command ./solr start once in “bin” folder. Following window appears.
Step -2 : Select “Files” in “filmsdata” core –
Step 3: Modifying values for soft/hard commit –
The time chosen for autoSoftCommit
determines the maximum time after a document is sent to Solr before it becomes searchable and does not affect the transaction log. Choose as long an interval as your application can tolerate for this value depending on the requirements.
Pro tip : For high bulk indexing, especially for the initial load if there is no searching, consider turning off autoSoftCommit
by specifying a value of -1
for the maxTime parameter.
Conclusion
NRT is very important and widely used aspect in solr applications and instrumental in handling bulk indexing operations. Correctly configured solr instance can drastically impact performance in terms of data availability and searchability and thus create a huge impact on business.
Thanks for taking time to read this article! You can reach out to us on Facebook, Twitter and Linkedin in case of any concerns/queries.