vind icon indicating copy to clipboard operation
vind copied to clipboard

Enable global meta data for batch commit identification

Open tkurz opened this issue 7 years ago • 1 comments

Current State

Vind https://javadoc.io/page/com.rbmhtechnology.vind/vind/latest/com/rbmhtechnology/vind/api/SearchServer.html provides some methods to index documents:

  • void index(Document... doc)
  • void index(List<Document> doc)
  • void indexBean(List<Object> t)
  • void indexBean(Object... t)

Internally, both methods trigger an indexing process but not a commit (which is an intended behavior, as the server itself can handle commits internally much more efficient). Note, there are methods for commit, which guarantee that all indexing processes are commited (with all negative consequences regarding performance).

Problem

In applications that support Read-Your-Writes this behaviour might be a problem (because the application has to guarantee an always-up-to-date index status and thus is forced to use many hard commits).

Idea

Vind could support version numbering for indexing processes so an application could proof, which is the latest version that has been indexed (and thus is able to control via an additional method, if the necessary indexes already has been processed). This could be an internal counter or a counter based within the application, which could lead to the following api:

  • long index(List<Document> doc)
  • void index(List<Document> doc, long version)

Note, that the other methods would work analogous. To get the latest index version there could be a method, like:

  • long getLatestVersion()
  • boolean isVersionIndexed(long version)

In addition, each Document could have an additional field version.

tkurz avatar Nov 06 '17 15:11 tkurz

A solution could be making use of solr document versioning:

  • By adding to the index request the parameter version=true solr reponse will provide the future version of each document being updated. So we could change the API to Map<String,Long> index(List<Document>) And the client could manage versioning in their application.
  • In the Solr schema, a new multi valued field to keep the historic of versions would be added to the document in order to ensure that when we check the doc version, a later update has not made impossible to find the expected one.
  • As mentioned previously in the issue, a new method to check weather the document has been already indexed or not: Boolean isVersionIndexed(String docId, Long version)

alfonso-noriega avatar Nov 15 '17 17:11 alfonso-noriega