solr-power icon indicating copy to clipboard operation
solr-power copied to clipboard

Support for indexing serialized arrays

Open mweimerskirch opened this issue 7 years ago • 4 comments

Many plugins store data (e. g. repeatable text boxes) as serialized arrays in the post_meta table. Example:

a:4:{i:0;a:1:{s:11:"description";s:298:"Lorem ipsum dolor sit amet,...

Indexing those fields can lead to unwanted search results as control characters and array keys are included. One way to solve this would be using the "solr_build_document" filter to replace the fields in the solr document before sending it to the server.

The solution I propose would make this easier by automatically "flattening" serialized arrays. The related pull request can be found under #346.

mweimerskirch avatar Nov 29 '17 15:11 mweimerskirch

The solution I propose would make this easier by automatically "flattening" serialized arrays.

What unexpected, undesirable edge cases could we encounter with this approach?

Also, it'd be worthwhile to see how the other search plugins handle this (ElasticPress and others).

danielbachhuber avatar Nov 29 '17 15:11 danielbachhuber

What unexpected, undesirable edge cases could we encounter with this approach?

The original content is no longer indexed in the "*_s" field, so if anyone used the content from the Solr results for display purposes and relied on the exact content of that field, that would no longer work. The original content is however still stored in the "*_str" field, so that use case would still be possible.

mweimerskirch avatar Nov 29 '17 15:11 mweimerskirch

Ok, I'm amenable to this change.

danielbachhuber avatar Dec 06 '17 13:12 danielbachhuber

@ataylorme I'm going to move this out of 2.0.0 because I think it's worth spending more than 30 minutes on to get right, per conversation in https://github.com/pantheon-systems/solr-power/pull/346#issuecomment-350814150

danielbachhuber avatar Sep 07 '18 11:09 danielbachhuber