evitaDB icon indicating copy to clipboard operation
evitaDB copied to clipboard

Compute "dynamic" set of attribute histogram for references

Open novoj opened this issue 2 years ago • 0 comments

During discussion with Next.JS team (namely Jakub Ruffer), new idea sprung up. In current situation is up to them to maintain list of filterable parameters, cache it and ask actively for attributeHistogram computation. This requires quite a complex logic in the middleware and also caching (which bring a lot of additional problems). Jakub came with following idea - if he could ask for all "referenceHistograms" and specify a filterBy constraint which references:

{
  queryProduct(
    filterBy: {
      attributeStatusEquals: "ACTIVE"
    }
  ) {
    extraResults {
      facetSummary {
        parameterValues(
          filterGroupBy: {
            attributeIsVisibleInFilterEquals: true          
          }
          orderGroupBy: {
            attributeOrderNatural: ASC
          }
        ) {
          count
          groupEntity {
            primaryKey
            attributes {
              code
            }
          }
          facetStatistics @include(if: groupHaving(attributeInputWidgetTypeEquals("CHECKBOX"))) {
            requested
            count
            facetEntity {
              primaryKey
              attributes {
                code
              }
            }
          }
          histogramStatistics @include(if: groupHaving(attributeInputWidgetTypeEquals("INTERVAL"))) {
            width {
              min
              max
              overallCount
              buckets(
                requestedCount: 20
                behavior: OPTIMIZED
              ) {
                threshold
                occurrences
                requested
              }
              facetEntity {
                primaryKey
                attributes {
                  code                
                }
              }
            }
          }
        }
      }
    }
  }
}

and evitaDB would compute "dynamic" count of histograms for target attributes based on reference relevancy and grouped by ReferenceContract#group.

This approach has multiple benefits:

  1. we can mix checkbox and interval types inside single "facetSummary"
  2. this allows us to filter and order their groups in a single declaration
  3. we calculate different output depending on a attribute defined on the group entity or facet entity itself (i.e. conditionally)
  4. we could also retrieve different facet entity data long with the returned statistics, in case of histogram only boundary entities data can be provided because all intermediary thresholds are only virtual ones

Proposed solution relates also to: https://github.com/FgForrest/evitaDB/issues/474

  • [ ] we need to remove temporary extension to the GraphQL API histograms allowing to retrieve histograms by names used in variable argument

novoj avatar Feb 24 '23 21:02 novoj