Word suggester

5/8/2023

Full text search uses TF/IDF (term frequency/inverse document frequency) to find the documents that are most relevant for the given search phrase. We want suggestions to be presented in a particular order, but this order is not the same as the ordering we need for full text search. All of the above inputs should return the single, nicely formatted suggestion of However, we want our suggestions to be explicit, leaving the user in no doubt about the page they will see if they click on our suggestion. "Courtyard by Marriot, Munich City" may type in any number of search phrases:

Scales horizontally, in exactly the same way as for full text search: the more nodes you add, the more you can scale. Having an FST per segment also means that the completion suggester Instead of consulting a single FST for the whole index, we query each per-segment FST to produce a unified list of results. Essentially, whenever a new segment is written to disk, we also write the FST to a file in a format which is fast to load into memory when required. Instead of building the FST at search time, we now build an FST per-segment at index time.

It is not acceptable to return out of date suggestions, nor to require a full rebuild whenever the index changes. "Real time search" is a mantra of Elasticsearch. And, as soon as the index changes, the FST needs to be rebuilt. This can be a slow, resource intensive process. Suggesters in Lucene are built in-memory by loading the completion values from the index, then building the FST. Following this in-memory graph is blazingly fast, as you will see from the benchmarks later in this blogpost. As soon as they type ma, then we can autocomplete the word "marriot". If the user types an m, then we can provide a list of all the "m" words. H, then we can see that there is only one possible completion: hotel, so we can immediately complete that word. Hotel, marriot, mercure, munchen and munich would look like this:Īll we do is start on the left and follow the paths to the right. For instance, and FST containing the words Instead, we use an in-memory data structure called an FST which contains valid suggestions and is optimised for fast retrieval and memory usage. Remember, we're making suggestions while the user types, so results need to be shown to the user within a few milliseconds, even after taking network latency into account! A full-blown search has to examine too many terms (and their frequencies) to perform sufficiently fast for this purpose. It was already possible to make suggestions using existing functionality in Elasticsearch, like prefix queries and ngrams, so why have we added a dedicated completion suggester? There are a few reasons: NOTE: Consider this feature experimental at the moment! Things might change/break in future releases. Giving the user the right search phrase before they have issued their first search makes for happier users and reduced load on your servers. Now, we are adding the completion suggester which can make suggestions while-you-type. Elasticsearch already hasĭid-you-mean functionality which can correct the user's spelling after they have searched. Effective search is not just about returning relevant results when a user types in a search phrase, it's also about helping your user to choose the best search phrases.

0 Comments

Word suggester

Leave a Reply.

Author

Archives

Categories