lucene
lucene copied to clipboard
Prevent PointValues from returning null for ghost fields
getPointValues may currently return null for unknown fields or fields that don't index points. It can happen that a field no longer has points for any document in a segment after delete+merge, which causes field info to think that the field is there and has points, yet when calling getPointValues null is returned.
With this change, we prevent getPointValues from returning null for ghost fields, it will instead return an empty instance of PointValues.
Disclaimer: this is an attempt around addressing potential issues with ghost fields when retrieving point values. It has to be said that most places that call getPointValues today have null checks, and don't consult field info upfront, hence they would not be exposed to the issue that FieldExistsQuery triggers, as field info says there are points and then the retrieved point values is null. One quick-fix would be to add a null check to FieldExistsQuery, though it feels like we should try to prevent this kind of situation from happening again in the future? getPointValues can still return null, but it shouldn't for ghost fields?
Relates to #11393
I pushed an update. I have removed null checks from consumers that were already checking field info. In most cases we already check field info for compatibility, hence we can expect point values to never be null. Where we instead we only check if field info is null, we also need to add a check that point dimension count is > 0 to replace the point values null check.
I merged your other PR that adds a null check in FieldExistsQuery, we should now be able to remove this null check with this change?
Test failures suggest CheckIndex needs to have its expectations adjusted.
@jpountz I addressed the check index issue, and pushed an update around the packed values that are null in the empty point values instance. That scenario is a bit tricky and ghost fields are not extensively tested, I wonder if more testing is a thing we'd like to do. I am not entirely sure if things may go wrong when the packed values are retrieved with PointTree
. That should be the intersect
path and I could not yet determine if there may be NPEs there too with ghost fields. I am though under impression that we assumed before that PointValues
is not null, hence in the worst case we would still have an NPE but in a different place.
What do you think?
@jpountz would you have time to take another look at this please?
Apologies @javanna, but after looking more at your changes, I'm getting worried that this change is harder than I had anticipated. I was optimistically hoping that never returning null PointValues would automatically help save some NullPointerExceptions but looking at your change it looks like there are many places where we just need to replace these null checks with other checks if we want to keep the logic correct, so things wouldn't magically work on segments that have ghost fields unless we explicitly test ghost fields. So I would suggest closing this PR, apologies for the time you spent on it.
Agreed @jpountz I think it was a good experiment to spend some time on, and I have also been thinking along the same lines, that the changes I ended up making were not solving the problem like we initially thought. No problem at all. I will open a follow-up PR to clarify the current behaviour and expectations.