lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Added interface to relate a LatLonShape with another shape represented as Component2D

Open navneet1v opened this issue 2 years ago • 11 comments

Description

Added interface to relate a LatLonShape with another shape represented as Component2D.

Issues

#11752

navneet1v avatar Sep 04 '22 19:09 navneet1v

It looks like this is already publicly exposed on LatLonShapeDocValues, which is public itself so I'm not clear why we need this new method? (Also it's awkward to be adding a search-time API (relate) that takes an index-time abstraction (LatLongShapeDocValuesField).)

@jpountz The function is publically exposed but the class LatLonShape cannot be created publically. Hence we need a way to create it.

Please let me know what could be right way to do it. Should I go ahead and create another interface?

navneet1v avatar Sep 08 '22 17:09 navneet1v

I had missed that the constructor was not publicly available. Am I guessing correctly that your goal is to create custom collectors that work with shapes? Maybe we should make the ctor that takes a BytesRef public for such use-cases? @nknize I wonder if you have better ideas as to how to unloch such use-cases.

jpountz avatar Sep 09 '22 08:09 jpountz

Maybe we should make the ctor that takes a BytesRef public for such use-cases?

We can either make the LatLonShapeDocValues and XYShapeDocValues ctor public, or add new factory methods to LatLonShape and XYShape for creating the DocValues instances like we do the fields? I like the latter for consistency.

I also think it might be a good idea to add a new public void resetBinaryValue(BytesRef binaryValue) method to ShapeDocValues? This would enable us to reuse the same ShapeDocValue instance inside of a query by just resetting the backing data from the iterator values.

I'm happy to do this but I think these would be a great contribution to help build your merit.

nknize avatar Sep 09 '22 14:09 nknize

Maybe we should make the ctor that takes a BytesRef public for such use-cases?

We can either make the LatLonShapeDocValues and XYShapeDocValues ctor public, or add new factory methods to LatLonShape and XYShape for creating the DocValues instances like we do the fields? I like the latter for consistency.

I also think it might be a good idea to add a new public void resetBinaryValue(BytesRef binaryValue) method to ShapeDocValues? This would enable us to reuse the same ShapeDocValue instance inside of a query by just resetting the backing data from the iterator values.

I'm happy to do this but I think these would be a great contribution to help build your merit.

I will take the approach of creating a new factory method in LatLonShape and XYShape. On the function: resetBinaryValue what is the idea you are suggesting?

navneet1v avatar Sep 09 '22 19:09 navneet1v

@jpountz @nknize can i get a review on this pr?

navneet1v avatar Oct 05 '22 17:10 navneet1v

@nknize can I get a review on this PR, its been open for a long time.

navneet1v avatar Oct 10 '22 17:10 navneet1v

I was hoping to get @nknize 's feedback on this since he's been closer to this work, but the API looks ok to me this way.

jpountz avatar Oct 11 '22 06:10 jpountz

I would prefer that factory methods that creates LatLonShapeDocValues are placed on that class instead of crowding LatLonShape with all those methods. Same for XY.

iverase avatar Oct 11 '22 08:10 iverase

I would prefer that factory methods that creates LatLonShapeDocValues are placed on that class instead of crowding LatLonShape with all those methods. Same for XY.

I can see functions which creates LatLonShapeDocValues present in the LatLonShape as of now. Plus I believe this is good as it provides 1 single class with different interfaces to work with LatLonShape or LatLonShapeDocValues.

navneet1v avatar Oct 11 '22 18:10 navneet1v

Plus I believe this is good as it provides 1 single class with different interfaces to work with LatLonShape or LatLonShapeDocValues.

I agree. That was the intent of the LatLonShape and XYShape factory classes so developer's don't have to go hunting around for all of the public ctors. It's a clean API.

nknize avatar Oct 11 '22 19:10 nknize

Updated the Code as there were merge conflicts. @jpountz or @iverase can you please review, as I need 1 more reviewer.

navneet1v avatar Oct 11 '22 20:10 navneet1v

16days since my review. If there are no concerns in the next 24 hrs I'll merge.

nknize avatar Oct 27 '22 18:10 nknize

I will say it again, please move the factory methods to the docvalues classes and make the constructor private. It makes no-sense to have them in LatLonShape / XYShape.

iverase avatar Oct 28 '22 08:10 iverase

... and make the constructor private. It makes no-sense to have them in LatLonShape / XYShape.

The ctor is already protected. It does make sense in that LatLonShape and XYShape are already factory classes that provide the factory methods to create the fields, field values, and queries. So the discussion is over preference on where to access the doc value factory method for the public facing API. I like/prefer the consistency in having it with all of the other factory methods because it puts the javadocs and methods in one place. Either way we need the public factory method and where it lives can change so I don't think the bikeshedding should hold up getting the method in and moving it later if there's that strong of opposition.

nknize avatar Oct 28 '22 12:10 nknize

I strongly believe that LatLonShape /XYShape should only contain factory methods that work on top of the BKD tree index. Factory methods that work on doc values should be moved to the DocValue classes so it is clear for a user which data structure are using. That is a stronger argument that having all java docs in one place.

iverase avatar Oct 28 '22 13:10 iverase

I strongly believe that LatLonShape /XYShape should only contain factory methods that work on top of the BKD tree index

I see what you're saying, and I really like that idea of separating DocValues and BKD factory methods to their own factory classes. Except currently LatLonShapeDocValuesQuery and XYShapeDocValuesQuery are not public and so their factory methods already live inside LatLonShape and XYShape factory class, respectively, as does the createDocValuesField factory methods. The javadocs have all of these factory methods listed as their respective createIndexableFIelds vs createDocValuesField.

For a similar reason I had thought about putting the doc value factory methods inside LatLonShapeDocValues and XYShapeDocValues but these classes are now the concrete docvalue implementations of ShapeDocValues so refactoring those DocValues factory methods conflates the purpose of the class. That's why everything was put in LatLonShape XYShape since there is no purpose for those classes other than factory methods (that's why they cannot be instantiated).

I suppose an alternative could be to create LatLonShapeDocValue and XYShapeDocValue (without the 's') factory classes that contain the docvalue factory methods only? I just think having a difference of an 's' between the classes would lead to more confusion of the public API.

Open to other ideas.

nknize avatar Oct 28 '22 14:10 nknize

If we can't think of anything quickly I think a good compromise would be to merge this long running PR so @navneet1v can use the field until we figure out if there even needs to be a separation of the binary doc value / bkd indexable field factory methods? I think the preference you raise of separating them is logical and reasonable, but we already have a single non-instantiable Factory Class and I don't know if there is a need for another one just to separate docvalues and bkd factory methods. We could achieve the same by adding inner static classes LatLonShape.DocValueFactory and XYShape.DocValueFactory, or creating separate factory classes in the same manner. Either way I think it's all just re-arranging deck chairs.

nknize avatar Oct 28 '22 14:10 nknize

I have a similar discussion not long time ago.

At the moment, once you create a ShapeDocValues, you don't know if is geo or cartesian because the specific implementations are hidden from the user. I think this is not the right approach as they are not the same object with two different implementations but two different objects with similar behaviour.

My opinion is that you should expose those concrete classes and the methods #createDocValuesField should return concrete implementations, that will allow you later on:

  1. add methods that only make sense on geo / cartesian, e.g #distanceToEquator() in geo
  2. Create methods that only make sense on geo / cartesian, e.g #computeDistanceToEquator(LatLonShapeDocValue)

I just gave my opinion and things are far away of what I think is best, maybe unnecessary for you. Feel free to push it, I am not blocking this.

iverase avatar Oct 28 '22 14:10 iverase

once you create a ShapeDocValues, you don't know if is geo or cartesian because the specific implementations are hidden from the user

I'm not following what you mean by this. The abstract ShapeDocValues class is package private, the concrete LatLonShapeDocValues and XYShapeDocValues implementations are public. The factory methods are / will be public (this PR) and they return the concrete Cartesian or Geographic instance. I'm not sure how that obfuscates anything from the user? My understanding of the preferred change to this PR was just the location of the factory methods, am I misunderstanding?

nknize avatar Oct 28 '22 14:10 nknize

Then I don't understand why you cannot move those methods to that classes. I understood from your last comment that they were hidden.

iverase avatar Oct 28 '22 14:10 iverase

I lost myself here:

First:

I see what you're saying, and I really like that idea of separating DocValues and BKD factory methods to their own factory classes. Except currently LatLonShapeDocValuesQuery and XYShapeDocValuesQuery are not public and so their factory methods already live inside LatLonShape and XYShape factory class, respectively, as does the createDocValuesField factory methods. The javadocs have all of these factory methods listed as their respective createIndexableFIelds vs createDocValuesField.

I thought you were speaking about fields not queries. That is the kind of mess I am trying to avoid, DocValuesQueries that can only be accessed from LatLonShape but not from the actual field that is using them.

Anyway as I said feel free to push it.

iverase avatar Oct 28 '22 15:10 iverase

@nknize can we backport this to 9.x branch

navneet1v avatar Oct 28 '22 18:10 navneet1v

There's no automation for that. Just check out branch_9x and cherry-pick the commit and open a backport PR.

nknize avatar Oct 28 '22 18:10 nknize

There's no automation for that. Just check out branch_9x and cherry-pick the commit and open a backport PR.

wasn't aware of that.

navneet1v avatar Oct 28 '22 19:10 navneet1v