dlt
dlt copied to clipboard
improve `progress` in normalize and load steps
Background
Progress reporting in normalize
and load
steps are far from perfect.
- in
normalize
we report progress on file level but that only is updated when a worker process is finished - in
load
the reported metrics do not survive restarts (see #853 )
Tasks
Step1. fix normalize
:
- use metrics collected in
extract
(per job and resource) to correctly report processed row per resource (where we have total number of records as well) - right now there's no communication between worker and main process. but we need to start reporting metrics back. so we need to update
Step 2. Fix load
:
- see #853 use package state to track the elapsed times (task created, start, stop of job)
- we are interested in following metrics to be displayed: jobs processed, average elapsed time, average lag (from job created to job started)
Implementation
- you'll need to use package state to store extract metrics (ExtractInfo) and normalize metrics
- if those elements are not present in the state you must fallback gracefully ie. reporting only the progress of the files. the job processing must be plain: if there are files they will be processed even if state is not present
ADDITIONAL THOUGHTS (@IlyaFaer ): There are two different cases:
- We extract and then normalize data - in this case we can take rows count from
ExtractInfo
- We normalize the data, extracted earlier