hibernate-search
hibernate-search copied to clipboard
HSEARCH-5133 Support basic metrics aggregations
https://hibernate.atlassian.net/browse/HSEARCH-5133
Draft (Preview):
~~1. we need to implement count, count distinct and avg~~ done! ~~2. test filtering on nested~~ done! ~~3. test explicit ValueConvert.YES/NO~~ done! ~~4. support and test aggregations for temporal types~~ done! 5. we need to implement the Lucene backend 6. write good tests covering all the fields types 7. write some doc
Still I prefer to release this draft so that we can get some feedback...
@marko-bekhta I just rebased the PR on top of the current main branch. Now I'm resuming to doing the changes...
I just added two last commits to address the last @yrodiere's suggestions. Once we're fine with them, I'll reorganize the commits.
Thanks @yrodiere I applied all the great suggestions here. My only doubt is a about the last change that I added in a different commit: https://github.com/hibernate/hibernate-search/pull/4144/commits/a807e2c39d8b1bb3dad7e708138f0a37631be94d.
In my opinion returning a double would be better, but I have an idea... we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think. What you think about it @marko-bekhta?
we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.
IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.
we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.
IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.
IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.
we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.
IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.
IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.
I don't get this, they already have the flexibility to require any type: https://github.com/hibernate/hibernate-search/pull/4144/files#diff-6b1cb9769f69a0a38bbf7c3d35a53612d9d88e45f8180ba2cbe9eea36c1baea0R27-R41
And in fact there's already no default: if they get an Integer average, it can only be because they asked for it.
Also, please note I'm not saying we should comply with every request. IMO there are only two types that make sense: the field type (ValueConvert.YES) and the underlying, unconverted type (ValueConvert.NO, would be double I guess).
we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.
IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.
IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.
I don't get this, they already have the flexibility to require any type: https://github.com/hibernate/hibernate-search/pull/4144/files#diff-6b1cb9769f69a0a38bbf7c3d35a53612d9d88e45f8180ba2cbe9eea36c1baea0R27-R41
And in fact there's already no default: if they get an
Integeraverage, it can only be because they asked for it.Also, please note I'm not saying we should comply with every request. IMO there are only two types that make sense: the field type (
ValueConvert.YES) and the underlying, unconverted type (ValueConvert.NO, would bedoubleI guess).
If I try to ask for a double a get this error in my test:
org.hibernate.search.util.common.SearchException: HSEARCH000584: Invalid type for returned values: 'java.lang.Double'. Expected 'java.lang.Integer' or a supertype.
Context: field 'integer'
at org.hibernate.search.engine.backend.types.converter.spi.ProjectionConverter.withConvertedType(ProjectionConverter.java:75)
at org.hibernate.search.backend.elasticsearch.search.aggregation.impl.ElasticsearchMetricFieldAggregation$TypeSelector.type(ElasticsearchMetricFieldAggregation.java:89)
at org.hibernate.search.backend.elasticsearch.search.aggregation.impl.ElasticsearchMetricFieldAggregation$TypeSelector.type(ElasticsearchMetricFieldAggregation.java:71)
at org.hibernate.search.engine.search.aggregation.dsl.impl.AvgAggregationFieldStepImpl.field(AvgAggregationFieldStepImpl.java:29)
at org.hibernate.search.engine.search.aggregation.dsl.AvgAggregationFieldStep.field(AvgAggregationFieldStep.java:28)
at org.hibernate.search.integrationtest.backend.elasticsearch.tmp.ElasticsearchMetricAggregationsIT.lambda$test_filteringResults$11(ElasticsearchMetricAggregationsIT.java:65)
at org.hibernate.search.engine.search.query.dsl.spi.AbstractSearchQueryOptionsStep.aggregation(AbstractSearchQueryOptionsStep.java:179)
at org.hibernate.search.integrationtest.backend.elasticsearch.tmp.ElasticsearchMetricAggregationsIT.test_filteringResults(ElasticsearchMetricAggregationsIT.java:65)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
unconverted type (ValueConvert.NO, would be double I guess).
OK so I can revert the last commit?
If I try to ask for a double a get this error in my test:
With ValueConvert.NO?
unconverted type (ValueConvert.NO, would be double I guess).
OK so I can revert the last commit?
From what I've seen that commit is fine and the DSL changes at least should stay, but I might be mistaken.
Can you point me to the exact failing test?
If I try to ask for a double a get this error in my test:
With
ValueConvert.NO?unconverted type (ValueConvert.NO, would be double I guess).
OK so I can revert the last commit?
From what I've seen that commit is fine and the DSL changes at least should stay, but I might be mistaken.
Can you point me to the exact failing test?
Sure! Here is the commit I which I tried to ask for a double avg => https://github.com/hibernate/hibernate-search/pull/4144/commits/6abad63b20195e1039f6fd0304d07698d2ddf808. Thanks
Hi @marko-bekhta, the commit HSEARCH-5133 Test all types with metric aggregations is in progress and I still need to address the other unresolved conversation. I will continue to work on it when I come back from PTO in two weeks. Thanks for the support
With this last push I think I have only one comment to address, the one about introducing abstract List<? extends CollectorFactory> requiredCollectors()...
With the last push I should have addressed all the comments. If so, is there anything to improve here? Thanks @marko-bekhta and @yrodiere
With last push... again just one comment left!
Hi @marko-bekhta. Comment addressed (I hope :P)
To fully support the RAW model type, I opened HSEARCH-5230 Support RAW model type with metric aggregations. Let me know if we can improve something else. Thanks!
Thanks for your pull request!
This pull request appears to follow the contribution rules.
› This message was automatically generated.
I integrated the corrections for the doc and incubating rebasing the contribution on top of the current main branch. Thanks
Thank you Marko and Yoann
