DataConnectors
DataConnectors copied to clipboard
Update Table.GenerateByPage to handle empty tables being returned
Per Table.GenerateByPage
's Helper Functions description, this method should call getNextPage
until that callback returns null, then will combine all returned pages into a single table.
However, when the existing Table.GenerateByPage
returns an empty table, its actual behavior differs from the above description (see below examples). If the callback returns an empty table, Table.GenerateByPage
may output 0 rows or an extra, all-null row--neither of which seems desirable or in alignment with the above description.
This PR addresses these behavior anomalies. (Note: It would be good for someone with internal Power Query knowledge to validate this PR to ensure that it is done in a way that aligns with PQ's lazy paradigm and that it does not trigger extra streaming.)
Demo of Existing's Behavior Anomalies
let
// Existing version of this method
Table.GenerateByPage =
(getNextPage as function) as table =>
let
listOfPages = List.Generate(
() => getNextPage(null), // get the first page of data
(lastPage) => lastPage <> null, // stop when the function returns null
(lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
),
// concatenate the pages together
tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
firstRow = tableOfPages{0}?
in
// if we didn't get back any pages of data, return an empty table
// otherwise set the table type based on the columns of the first page
if (firstRow = null) then
Table.FromRows({})
// check for empty first table
else if (Table.IsEmpty(firstRow[Column1])) then
firstRow[Column1]
else
Value.ReplaceType(
Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),
Value.Type(firstRow[Column1])
)
,
PageFunction = (previous) =>
let
/*
// Actual: In this scenario, 2 rows are output, the 2nd holding a null column value because Response2 contained 0 rows.
// Expected: Only 1 row (from Response1) should have been returned.
Response1 = #table({"A"},{{1}}),
Response2 = #table({"A"},{})
*/
// Actual: In this scenario, 0 rows are output. Response1 contained no rows so Response2's data is skipped
// Expected: 1 row returned (row from Response2)
Response1 = #table({"A"},{}),
Response2 = #table({"A"},{{1}})
in
if previous = null then Response1
else if previous = Response1 then Response2
else null,
Result = Table.GenerateByPage(PageFunction)
in
Result
We specifically can't use Table.Combine, as that will effectively break the streaming behavior for most cases.
FWIW, we somehow ended up with an older implementation as the "boilerplate" for this function. A slightly newer definition -- albeit one which still doesn't fix the "empty row" problem -- looks like this:
Table.GenerateByPage = (
getNextPage as function,
optional tableType as type
) as table =>
let
listOfPages = List.Generate(
() => getNextPage(null),
(lastPage) => lastPage <> null,
(lastPage) => getNextPage(lastPage)
),
tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
firstRow = tableOfPages{0}?,
keys = if tableType = null then Table.ColumnNames(firstRow[Column1])
else Record.FieldNames(Type.RecordFields(Type.TableRow(tableType))),
appliedType = if tableType = null then Value.Type(firstRow[Column1]) else tableType
in
if tableType = null and firstRow = null then
Table.FromRows({})
else
Value.ReplaceType(
Table.ExpandTableColumn(tableOfPages, "Column1", keys),
appliedType
)
What's better about it is that when you know the table type, it avoids the duplicate request for the first page.
@CurtHagenlocher,
Thanks for that revised logic! I was planning to look into the effects of the first-row-get-type logic today--but the revision you provided should save me that effort. :-)
Is switching from Table.Combine
to filtering listOfPages
using Table.IsEmpty
any better from the streaming perspective (see PR revision made a few minutes ago)?
@CurtHagenlocher, do you like either of the last two commits ("Adjusting error propagation" or "Reverting last change to avoid triggering a row fetch")?