DataConnectors icon indicating copy to clipboard operation
DataConnectors copied to clipboard

Backward apply filters along lazy evaluated tables to improve performance

Open PatrickHofman opened this issue 7 years ago • 0 comments

We have a number of data sets which are quite large and whose metadata is retrieved from Open sources such as CBS.NL (government data) which is OData. This sample has over 10.000 data sets with more than 1.000 URLs of metadata. (Just like retrieving all tables from a database with 148.000 tables)

The query in Power BI is:

let
    Source = Invantive.CbsNl.Database(),
    ArbeidskostenParticuliereBedrijven_Gediscontinueerd_Stopgezet_cbsnl70100ned_categorygroups = Source{[Catalog="ArbeidskostenParticuliereBedrijven",Schema="Gediscontinueerd_Stopgezet",Name="cbsnl70100ned_categorygroups"]}[Data]
in
    ArbeidskostenParticuliereBedrijven_Gediscontinueerd_Stopgezet_cbsnl70100ned_categorygroups

This query is generated by Power BI from navigation table. The initial step when evaluated retrieve all data sets and the second step filters it down to 1 data set and then retrieves its data.

It would improve performance when somehow the initial request for Invantive.CbsNl.Database() would learn that only data is to be returned where for instance Catalog="ArbeidskostenParticuliereBedrijven". This performance issue does not arise when you code the query by hand as select * from ArbeidskostenParticuliereBedrijven.Gediscontinueerd_Stopgezet.cbsnl70100ned_categorygroups.

Scenario is similar to optimization of a tradition SQL query on unique IDs like:

select *
from t1
join t2
on t1.c = t2.c
where t1.c =1

In an optimized scenario this would be equal to:

select *
from t1
join t2
on t2.c = 1
where t1.c = 1

Three solution directions come into mind:

  1. Maybe an optimization of the generated query is already available, but unknown to me. Then this issue would be a request for documentation improvement.
  2. Backward apply filters based upon transitivity into the steps.
  3. Allow changing the way the query is generated based upon the navigation table.

PatrickHofman avatar Jun 29 '17 13:06 PatrickHofman