luigi
luigi copied to clipboard
Time measurement with tqdm not working while yielding dynamic dependencies
This is a rather complex issue, I know, but I think it is closer related to luigi than to tqdm. Minimum example:
import luigi
from time import sleep
from tqdm import tqdm
class Spam(luigi.Task):
def output(self):
return luigi.LocalTarget('spam')
def run(self):
for i in tqdm(range(10)):
yield Eggs(str(i))
with self.output().open('w') as f:
f.write('done')
class Eggs(luigi.Task):
i = luigi.Parameter()
def output(self):
return luigi.LocalTarget(f'egg_{self.i}')
def run(self):
sleep(0.1)
with self.output().open('w') as f:
f.write(self.i)
When running the Spam
task from the command line, the eventual tqdm bar will look like this:
$ time luigi Spam --module=test2 --log-level=ERROR
0%| | 0/10 [00:00<?, ?it/s]
10%|█▋ | 1/10 [00:00<00:00, 486.35it/s]
20%|███▍ | 2/10 [00:00<00:00, 582.87it/s]
30%|████▊ | 3/10 [00:00<00:00, 2524.15it/s]
40%|██████▍ | 4/10 [00:00<00:00, 2278.58it/s]
50%|████████ | 5/10 [00:00<00:00, 2970.47it/s]
60%|█████████▌ | 6/10 [00:00<00:00, 3550.98it/s]
70%|███████████▏ | 7/10 [00:00<00:00, 4551.96it/s]
80%|████████████▊ | 8/10 [00:00<00:00, 3910.32it/s]
90%|██████████████▍ | 9/10 [00:00<00:00, 4985.96it/s]
100%|███████████████| 10/10 [00:00<00:00, 5084.62it/s]
real 0m1.654s
user 0m0.459s
sys 0m0.169s
Note that the total time recorded by tqdm is 00:00
whereas the actual execution takes more than one second.
I have no idea at which end the error is caused but I could not reproduce the issue in tqdm by reporting progress from within a generator function and adding delays from the consumer of the generator, nor by extracting the delays into a background process and pausing the main process. Which kind of magic is luigi here that rule out tqdm's time management? If you have any ideas, it would be great to share them with them tqdm community!
looks like either a bug or misunderstanding in how luigi
works, not tqdm
.
-from time import sleep
+from time import sleep, time
...
- for i in tqdm(range(10)):
- yield Eggs(str(i))
+ with tqdm(range(10)) as bar:
+ start = time()
+ for i in bar:
+ yield Eggs(str(i))
+ print(bar.format_dict['elapsed'], "should be approx", time() - start)
- manually measuring time matches
tqdm
's internal measure -
print()
gets executed 1+2+...+10 = 55 times (!) - I would've expected only 10 executions
Ah, thanks for the observation, @casperdcl! I realize that luigi evaluates the run()
method entirely from the beginning until a not-yet-resolved dependency is met. So it is no wonder that the time measurement does not work because it always represents a short snapshot only ... @luigi, would there be any chance to revise this confusing behavior or would this be out of scope?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.