DataConnectors icon indicating copy to clipboard operation
DataConnectors copied to clipboard

Update Table.GenerateByPage to handle empty tables being returned

Open bgribaudo opened this issue 2 years ago • 3 comments

Per Table.GenerateByPage's Helper Functions description, this method should call getNextPage until that callback returns null, then will combine all returned pages into a single table.

However, when the existing Table.GenerateByPage returns an empty table, its actual behavior differs from the above description (see below examples). If the callback returns an empty table, Table.GenerateByPage may output 0 rows or an extra, all-null row--neither of which seems desirable or in alignment with the above description.

This PR addresses these behavior anomalies. (Note: It would be good for someone with internal Power Query knowledge to validate this PR to ensure that it is done in a way that aligns with PQ's lazy paradigm and that it does not trigger extra streaming.)

Demo of Existing's Behavior Anomalies

let
    // Existing version of this method
    Table.GenerateByPage = 
        (getNextPage as function) as table =>
            let        
                listOfPages = List.Generate(
                    () => getNextPage(null),            // get the first page of data
                    (lastPage) => lastPage <> null,     // stop when the function returns null
                    (lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
                ),
                // concatenate the pages together
                tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
                firstRow = tableOfPages{0}?
            in
                // if we didn't get back any pages of data, return an empty table
                // otherwise set the table type based on the columns of the first page
                if (firstRow = null) then
                    Table.FromRows({})
                // check for empty first table
                else if (Table.IsEmpty(firstRow[Column1])) then
                    firstRow[Column1]
                else
                    Value.ReplaceType(
                        Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),
                        Value.Type(firstRow[Column1])
                    )
    ,
    PageFunction = (previous) => 
        let
            /*
            // Actual: In this scenario, 2 rows are output, the 2nd holding a null column value because Response2 contained 0 rows.
            // Expected: Only 1 row (from Response1) should have been returned.
            Response1 = #table({"A"},{{1}}),
            Response2 = #table({"A"},{})
            */
            // Actual: In this scenario, 0 rows are output. Response1 contained no rows so Response2's data is skipped
            // Expected: 1 row returned (row from Response2)
            Response1 = #table({"A"},{}),
            Response2 = #table({"A"},{{1}})
        in
            if previous = null then Response1 
            else if previous = Response1 then Response2
            else null,
    Result = Table.GenerateByPage(PageFunction)
in
    Result

bgribaudo avatar Mar 24 '22 14:03 bgribaudo

We specifically can't use Table.Combine, as that will effectively break the streaming behavior for most cases.

FWIW, we somehow ended up with an older implementation as the "boilerplate" for this function. A slightly newer definition -- albeit one which still doesn't fix the "empty row" problem -- looks like this:

Table.GenerateByPage = (
    getNextPage as function,
    optional tableType as type
) as table =>
    let
        listOfPages = List.Generate(
            () => getNextPage(null),
            (lastPage) => lastPage <> null,
            (lastPage) => getNextPage(lastPage)
        ),
        tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
        firstRow = tableOfPages{0}?,
        keys = if tableType = null then Table.ColumnNames(firstRow[Column1])
            else Record.FieldNames(Type.RecordFields(Type.TableRow(tableType))),
        appliedType = if tableType = null then Value.Type(firstRow[Column1]) else tableType
    in
        if tableType = null and firstRow = null then
            Table.FromRows({})
        else
            Value.ReplaceType(
                Table.ExpandTableColumn(tableOfPages, "Column1", keys),
                appliedType
            )

What's better about it is that when you know the table type, it avoids the duplicate request for the first page.

CurtHagenlocher avatar Mar 25 '22 13:03 CurtHagenlocher

@CurtHagenlocher,

Thanks for that revised logic! I was planning to look into the effects of the first-row-get-type logic today--but the revision you provided should save me that effort. :-)

Is switching from Table.Combine to filtering listOfPages using Table.IsEmpty any better from the streaming perspective (see PR revision made a few minutes ago)?

bgribaudo avatar Mar 25 '22 15:03 bgribaudo

@CurtHagenlocher, do you like either of the last two commits ("Adjusting error propagation" or "Reverting last change to avoid triggering a row fetch")?

bgribaudo avatar May 02 '22 12:05 bgribaudo