lucene
lucene copied to clipboard
Added interface to relate a LatLonShape with another shape represented as Component2D
Description
Added interface to relate a LatLonShape with another shape represented as Component2D.
Issues
#11752
It looks like this is already publicly exposed on
LatLonShapeDocValues
, which is public itself so I'm not clear why we need this new method? (Also it's awkward to be adding a search-time API (relate
) that takes an index-time abstraction (LatLongShapeDocValuesField
).)
@jpountz The function is publically exposed but the class LatLonShape cannot be created publically. Hence we need a way to create it.
Please let me know what could be right way to do it. Should I go ahead and create another interface?
I had missed that the constructor was not publicly available. Am I guessing correctly that your goal is to create custom collectors that work with shapes? Maybe we should make the ctor that takes a BytesRef
public for such use-cases? @nknize I wonder if you have better ideas as to how to unloch such use-cases.
Maybe we should make the ctor that takes a
BytesRef
public for such use-cases?
We can either make the LatLonShapeDocValues
and XYShapeDocValues
ctor public, or add new factory methods to LatLonShape
and XYShape
for creating the DocValues instances like we do the fields? I like the latter for consistency.
I also think it might be a good idea to add a new public void resetBinaryValue(BytesRef binaryValue)
method to ShapeDocValues
? This would enable us to reuse the same ShapeDocValue instance inside of a query by just resetting the backing data from the iterator values.
I'm happy to do this but I think these would be a great contribution to help build your merit.
Maybe we should make the ctor that takes a
BytesRef
public for such use-cases?We can either make the
LatLonShapeDocValues
andXYShapeDocValues
ctor public, or add new factory methods toLatLonShape
andXYShape
for creating the DocValues instances like we do the fields? I like the latter for consistency.I also think it might be a good idea to add a new
public void resetBinaryValue(BytesRef binaryValue)
method toShapeDocValues
? This would enable us to reuse the same ShapeDocValue instance inside of a query by just resetting the backing data from the iterator values.I'm happy to do this but I think these would be a great contribution to help build your merit.
I will take the approach of creating a new factory method in LatLonShape and XYShape. On the function: resetBinaryValue what is the idea you are suggesting?
@jpountz @nknize can i get a review on this pr?
@nknize can I get a review on this PR, its been open for a long time.
I was hoping to get @nknize 's feedback on this since he's been closer to this work, but the API looks ok to me this way.
I would prefer that factory methods that creates LatLonShapeDocValues
are placed on that class instead of crowding LatLonShape
with all those methods. Same for XY.
I would prefer that factory methods that creates
LatLonShapeDocValues
are placed on that class instead of crowdingLatLonShape
with all those methods. Same for XY.
I can see functions which creates LatLonShapeDocValues present in the LatLonShape as of now. Plus I believe this is good as it provides 1 single class with different interfaces to work with LatLonShape or LatLonShapeDocValues.
Plus I believe this is good as it provides 1 single class with different interfaces to work with LatLonShape or LatLonShapeDocValues.
I agree. That was the intent of the LatLonShape and XYShape factory classes so developer's don't have to go hunting around for all of the public ctors. It's a clean API.
Updated the Code as there were merge conflicts. @jpountz or @iverase can you please review, as I need 1 more reviewer.
16days since my review. If there are no concerns in the next 24 hrs I'll merge.
I will say it again, please move the factory methods to the docvalues classes and make the constructor private. It makes no-sense to have them in LatLonShape / XYShape.
... and make the constructor private. It makes no-sense to have them in LatLonShape / XYShape.
The ctor is already protected. It does make sense in that LatLonShape and XYShape are already factory classes that provide the factory methods to create the fields, field values, and queries. So the discussion is over preference on where to access the doc value factory method for the public facing API. I like/prefer the consistency in having it with all of the other factory methods because it puts the javadocs and methods in one place. Either way we need the public factory method and where it lives can change so I don't think the bikeshedding should hold up getting the method in and moving it later if there's that strong of opposition.
I strongly believe that LatLonShape /XYShape should only contain factory methods that work on top of the BKD tree index. Factory methods that work on doc values should be moved to the DocValue classes so it is clear for a user which data structure are using. That is a stronger argument that having all java docs in one place.
I strongly believe that LatLonShape /XYShape should only contain factory methods that work on top of the BKD tree index
I see what you're saying, and I really like that idea of separating DocValues and BKD factory methods to their own factory classes. Except currently LatLonShapeDocValuesQuery and XYShapeDocValuesQuery are not public and so their factory methods already live inside LatLonShape and XYShape factory class, respectively, as does the createDocValuesField
factory methods. The javadocs have all of these factory methods listed as their respective createIndexableFIelds
vs createDocValuesField
.
For a similar reason I had thought about putting the doc value factory methods inside LatLonShapeDocValues
and XYShapeDocValues
but these classes are now the concrete docvalue implementations of ShapeDocValues
so refactoring those DocValues factory methods conflates the purpose of the class. That's why everything was put in LatLonShape
XYShape
since there is no purpose for those classes other than factory methods (that's why they cannot be instantiated).
I suppose an alternative could be to create LatLonShapeDocValue
and XYShapeDocValue
(without the 's') factory classes that contain the docvalue factory methods only? I just think having a difference of an 's' between the classes would lead to more confusion of the public API.
Open to other ideas.
If we can't think of anything quickly I think a good compromise would be to merge this long running PR so @navneet1v can use the field until we figure out if there even needs to be a separation of the binary doc value / bkd indexable field factory methods? I think the preference you raise of separating them is logical and reasonable, but we already have a single non-instantiable Factory Class and I don't know if there is a need for another one just to separate docvalues and bkd factory methods. We could achieve the same by adding inner static classes LatLonShape.DocValueFactory
and XYShape.DocValueFactory
, or creating separate factory classes in the same manner. Either way I think it's all just re-arranging deck chairs.
I have a similar discussion not long time ago.
At the moment, once you create a ShapeDocValues, you don't know if is geo or cartesian because the specific implementations are hidden from the user. I think this is not the right approach as they are not the same object with two different implementations but two different objects with similar behaviour.
My opinion is that you should expose those concrete classes and the methods #createDocValuesField should return concrete implementations, that will allow you later on:
- add methods that only make sense on geo / cartesian, e.g #distanceToEquator() in geo
- Create methods that only make sense on geo / cartesian, e.g #computeDistanceToEquator(LatLonShapeDocValue)
I just gave my opinion and things are far away of what I think is best, maybe unnecessary for you. Feel free to push it, I am not blocking this.
once you create a ShapeDocValues, you don't know if is geo or cartesian because the specific implementations are hidden from the user
I'm not following what you mean by this. The abstract ShapeDocValues
class is package private, the concrete LatLonShapeDocValues and XYShapeDocValues implementations are public. The factory methods are / will be public (this PR) and they return the concrete Cartesian or Geographic instance. I'm not sure how that obfuscates anything from the user? My understanding of the preferred change to this PR was just the location of the factory methods, am I misunderstanding?
Then I don't understand why you cannot move those methods to that classes. I understood from your last comment that they were hidden.
I lost myself here:
First:
I see what you're saying, and I really like that idea of separating DocValues and BKD factory methods to their own factory classes. Except currently LatLonShapeDocValuesQuery and XYShapeDocValuesQuery are not public and so their factory methods already live inside LatLonShape and XYShape factory class, respectively, as does the createDocValuesField factory methods. The javadocs have all of these factory methods listed as their respective createIndexableFIelds vs createDocValuesField.
I thought you were speaking about fields not queries. That is the kind of mess I am trying to avoid, DocValuesQueries that can only be accessed from LatLonShape but not from the actual field that is using them.
Anyway as I said feel free to push it.
@nknize can we backport this to 9.x branch
There's no automation for that. Just check out branch_9x
and cherry-pick the commit and open a backport PR.
There's no automation for that. Just check out
branch_9x
and cherry-pick the commit and open a backport PR.
wasn't aware of that.