Flowpack.ElasticSearch.ContentRepositoryAdaptor BUGFIX: use correct context for document-identifier

Problem

The current implementation falls back to the workspace name of the node (i.e. the value of the underlying NodeData), which in most cases is "live", generating the same document-identifier when indexing separate workspaces, thus always updating the same elasticsearch document with a new workspace. Because of that only the last indexed workspace would have nodes in the index and querying "live" would most likely not find results.

Implementation

The NodePathBasedDocumentIdentifierGenerator uses the context-path of the node and might replace the workspace with the given $targetWorkspaceName. In general that would leave the user-workspace as is: the context-path in the node is built from the workspace-name from the node context rather than the workspace name of the NodeData object.

The NodeIdentifierBasedDocumentIdentifierGenerator now also uses the node context values.

As the "original" solution would also include the dimensions, I added them as well: I utilized the NodePaths utility to format workspace-name and dimensions. I see how it might be debatable to misuse the utility and I guess it's not really necessary to match the format, so that could be changed, if preferred.

Relates to #394

May 15 '23 11:05 PRGfx

(@daniellienert maybe interesting for you)

I stumbled across this and was a little confused at first. But after a bit of trial and error and research, I would like to leave a few thoughts here:

Your change makes "some" sense in my opinion. If you use the “normal” ./flow nodeindex:build command, all nodes in all workspaces are indexed, even if there are no changes in them. With a simple $node->getWorkspace()->getName() at this point, however, this only happens for nodes with changes in the workspace. In this point, it makes absolute sense to use ->getContext() here.

When using Flowpack.ElasticSearch.ContentRepositoryQueueIndexer, however, it makes no difference according to my tests, as this approaches the initial index structure differently. (Here there are only entries in ES for live and in other worksapces only for nodes with changes)

In my opinion, adding the dimension-values to the hash is not necessary, as a separate indices are created and used for each dimension-combination. So IMHO it's not not needed to make a difference between the dimensions within one index. On the other hand, it shouldn't do any harm either, as far as I can see. (Maybe @daniellienert knows more insights here?)

@daniellienert Could it be that you developed your PR #394 with the queue and therefore didn't notice it? In our tests it causes problems without QueueIndexer, but not with.

Aug 20 '24 17:08 Nikdro

Hey. @PRGfx, thanks for your PR. After some discussions with @Nikdro - we go with the simplier approach, merged with #412

Oct 21 '24 19:10 daniellienert

Flowpack.ElasticSearch.ContentRepositoryAdaptor Flowpack.ElasticSearch.ContentRepositoryAdaptor copied to clipboard

BUGFIX: use correct context for document-identifier

Problem

Implementation

Flowpack.ElasticSearch.ContentRepositoryAdaptor
Flowpack.ElasticSearch.ContentRepositoryAdaptor copied to clipboard