hibernate-search icon indicating copy to clipboard operation
hibernate-search copied to clipboard

HSEARCH-5133 Support basic metrics aggregations

Open fax4ever opened this issue 1 year ago • 11 comments

https://hibernate.atlassian.net/browse/HSEARCH-5133

Draft (Preview):

~~1. we need to implement count, count distinct and avg~~ done! ~~2. test filtering on nested~~ done! ~~3. test explicit ValueConvert.YES/NO~~ done! ~~4. support and test aggregations for temporal types~~ done! 5. we need to implement the Lucene backend 6. write good tests covering all the fields types 7. write some doc

Still I prefer to release this draft so that we can get some feedback...

fax4ever avatar May 02 '24 15:05 fax4ever

@marko-bekhta I just rebased the PR on top of the current main branch. Now I'm resuming to doing the changes...

fax4ever avatar Jun 18 '24 08:06 fax4ever

I just added two last commits to address the last @yrodiere's suggestions. Once we're fine with them, I'll reorganize the commits.

fax4ever avatar Jun 25 '24 11:06 fax4ever

Thanks @yrodiere I applied all the great suggestions here. My only doubt is a about the last change that I added in a different commit: https://github.com/hibernate/hibernate-search/pull/4144/commits/a807e2c39d8b1bb3dad7e708138f0a37631be94d.

In my opinion returning a double would be better, but I have an idea... we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think. What you think about it @marko-bekhta?

fax4ever avatar Jun 26 '24 12:06 fax4ever

we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.

IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.

yrodiere avatar Jun 26 '24 13:06 yrodiere

we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.

IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.

IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.

fax4ever avatar Jun 26 '24 13:06 fax4ever

we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.

IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.

IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.

I don't get this, they already have the flexibility to require any type: https://github.com/hibernate/hibernate-search/pull/4144/files#diff-6b1cb9769f69a0a38bbf7c3d35a53612d9d88e45f8180ba2cbe9eea36c1baea0R27-R41

And in fact there's already no default: if they get an Integer average, it can only be because they asked for it.

Also, please note I'm not saying we should comply with every request. IMO there are only two types that make sense: the field type (ValueConvert.YES) and the underlying, unconverted type (ValueConvert.NO, would be double I guess).

yrodiere avatar Jun 26 '24 13:06 yrodiere

we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.

IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.

IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.

I don't get this, they already have the flexibility to require any type: https://github.com/hibernate/hibernate-search/pull/4144/files#diff-6b1cb9769f69a0a38bbf7c3d35a53612d9d88e45f8180ba2cbe9eea36c1baea0R27-R41

And in fact there's already no default: if they get an Integer average, it can only be because they asked for it.

Also, please note I'm not saying we should comply with every request. IMO there are only two types that make sense: the field type (ValueConvert.YES) and the underlying, unconverted type (ValueConvert.NO, would be double I guess).

If I try to ask for a double a get this error in my test:

org.hibernate.search.util.common.SearchException: HSEARCH000584: Invalid type for returned values: 'java.lang.Double'. Expected 'java.lang.Integer' or a supertype.
Context: field 'integer'

	at org.hibernate.search.engine.backend.types.converter.spi.ProjectionConverter.withConvertedType(ProjectionConverter.java:75)
	at org.hibernate.search.backend.elasticsearch.search.aggregation.impl.ElasticsearchMetricFieldAggregation$TypeSelector.type(ElasticsearchMetricFieldAggregation.java:89)
	at org.hibernate.search.backend.elasticsearch.search.aggregation.impl.ElasticsearchMetricFieldAggregation$TypeSelector.type(ElasticsearchMetricFieldAggregation.java:71)
	at org.hibernate.search.engine.search.aggregation.dsl.impl.AvgAggregationFieldStepImpl.field(AvgAggregationFieldStepImpl.java:29)
	at org.hibernate.search.engine.search.aggregation.dsl.AvgAggregationFieldStep.field(AvgAggregationFieldStep.java:28)
	at org.hibernate.search.integrationtest.backend.elasticsearch.tmp.ElasticsearchMetricAggregationsIT.lambda$test_filteringResults$11(ElasticsearchMetricAggregationsIT.java:65)
	at org.hibernate.search.engine.search.query.dsl.spi.AbstractSearchQueryOptionsStep.aggregation(AbstractSearchQueryOptionsStep.java:179)
	at org.hibernate.search.integrationtest.backend.elasticsearch.tmp.ElasticsearchMetricAggregationsIT.test_filteringResults(ElasticsearchMetricAggregationsIT.java:65)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

fax4ever avatar Jun 26 '24 13:06 fax4ever

unconverted type (ValueConvert.NO, would be double I guess).

OK so I can revert the last commit?

fax4ever avatar Jun 26 '24 13:06 fax4ever

If I try to ask for a double a get this error in my test:

With ValueConvert.NO?

unconverted type (ValueConvert.NO, would be double I guess).

OK so I can revert the last commit?

From what I've seen that commit is fine and the DSL changes at least should stay, but I might be mistaken.

Can you point me to the exact failing test?

yrodiere avatar Jun 26 '24 13:06 yrodiere

If I try to ask for a double a get this error in my test:

With ValueConvert.NO?

unconverted type (ValueConvert.NO, would be double I guess).

OK so I can revert the last commit?

From what I've seen that commit is fine and the DSL changes at least should stay, but I might be mistaken.

Can you point me to the exact failing test?

Sure! Here is the commit I which I tried to ask for a double avg => https://github.com/hibernate/hibernate-search/pull/4144/commits/6abad63b20195e1039f6fd0304d07698d2ddf808. Thanks

fax4ever avatar Jun 26 '24 13:06 fax4ever

Hi @marko-bekhta, the commit HSEARCH-5133 Test all types with metric aggregations is in progress and I still need to address the other unresolved conversation. I will continue to work on it when I come back from PTO in two weeks. Thanks for the support

fax4ever avatar Aug 01 '24 12:08 fax4ever

With this last push I think I have only one comment to address, the one about introducing abstract List<? extends CollectorFactory> requiredCollectors()...

fax4ever avatar Aug 26 '24 10:08 fax4ever

With the last push I should have addressed all the comments. If so, is there anything to improve here? Thanks @marko-bekhta and @yrodiere

fax4ever avatar Aug 26 '24 15:08 fax4ever

With last push... again just one comment left!

fax4ever avatar Aug 28 '24 08:08 fax4ever

Hi @marko-bekhta. Comment addressed (I hope :P)

To fully support the RAW model type, I opened HSEARCH-5230 Support RAW model type with metric aggregations. Let me know if we can improve something else. Thanks!

fax4ever avatar Aug 28 '24 14:08 fax4ever

Thanks for your pull request!

This pull request appears to follow the contribution rules.

› This message was automatically generated.

I integrated the corrections for the doc and incubating rebasing the contribution on top of the current main branch. Thanks

fax4ever avatar Sep 04 '24 15:09 fax4ever

Quality Gate Failed Quality Gate failed

Failed conditions
10.1% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Sep 04 '24 17:09 sonarqubecloud[bot]

Thank you Marko and Yoann

fax4ever avatar Sep 05 '24 09:09 fax4ever