framework
framework copied to clipboard
Improve stats calculation for complex sequential transforms
@roll In the example below, after a third field is added to a resource using steps.field_add, the number of rows goes from 999 to 900.
from frictionless import Resource, steps, Pipeline
r = Resource([["a"] for _ in range(1000)])
t1 = r.transform(Pipeline(steps=[steps.field_add(name="b", value="b")]))
t1.infer(stats=True)
t1.stats
{'fields': 2, 'rows': 999}
t2 = t1.transform(Pipeline(steps=[steps.field_add(name="c", value="c")]))
t2.infer(stats=True)
t2.stats
{'fields': 3, 'rows': 900}
Thanks @aivuk let me investiage
It seems to be the stats bug -- the rows themselves are there:
print(len(t2.read_rows()))
# 999