framework icon indicating copy to clipboard operation
framework copied to clipboard

Improve stats calculation for complex sequential transforms

Open aivuk opened this issue 2 years ago • 2 comments

@roll In the example below, after a third field is added to a resource using steps.field_add, the number of rows goes from 999 to 900.

from frictionless import Resource, steps, Pipeline

r = Resource([["a"] for _ in range(1000)])
t1 = r.transform(Pipeline(steps=[steps.field_add(name="b", value="b")]))
t1.infer(stats=True)
t1.stats
{'fields': 2, 'rows': 999}
t2 = t1.transform(Pipeline(steps=[steps.field_add(name="c", value="c")]))
t2.infer(stats=True)
t2.stats
{'fields': 3, 'rows': 900}

aivuk avatar Feb 09 '23 08:02 aivuk

Thanks @aivuk let me investiage

roll avatar Feb 09 '23 11:02 roll

It seems to be the stats bug -- the rows themselves are there:

print(len(t2.read_rows()))
# 999

roll avatar Feb 18 '23 11:02 roll