luigi icon indicating copy to clipboard operation
luigi copied to clipboard

Time measurement with tqdm not working while yielding dynamic dependencies

Open LinqLover opened this issue 3 years ago • 3 comments

This is a rather complex issue, I know, but I think it is closer related to luigi than to tqdm. Minimum example:

import luigi
from time import sleep
from tqdm import tqdm


class Spam(luigi.Task):

    def output(self):
        return luigi.LocalTarget('spam')

    def run(self):
        for i in tqdm(range(10)):
            yield Eggs(str(i))
        with self.output().open('w') as f:
            f.write('done')


class Eggs(luigi.Task):

    i = luigi.Parameter()

    def output(self):
        return luigi.LocalTarget(f'egg_{self.i}')

    def run(self):
        sleep(0.1)
        with self.output().open('w') as f:
            f.write(self.i)

When running the Spam task from the command line, the eventual tqdm bar will look like this:

$ time luigi Spam --module=test2 --log-level=ERROR
  0%|                          | 0/10 [00:00<?, ?it/s]
 10%|█▋               | 1/10 [00:00<00:00, 486.35it/s]
 20%|███▍             | 2/10 [00:00<00:00, 582.87it/s]
 30%|████▊           | 3/10 [00:00<00:00, 2524.15it/s]
 40%|██████▍         | 4/10 [00:00<00:00, 2278.58it/s]
 50%|████████        | 5/10 [00:00<00:00, 2970.47it/s]
 60%|█████████▌      | 6/10 [00:00<00:00, 3550.98it/s]
 70%|███████████▏    | 7/10 [00:00<00:00, 4551.96it/s]
 80%|████████████▊   | 8/10 [00:00<00:00, 3910.32it/s]
 90%|██████████████▍ | 9/10 [00:00<00:00, 4985.96it/s]
100%|███████████████| 10/10 [00:00<00:00, 5084.62it/s]

real    0m1.654s
user    0m0.459s
sys     0m0.169s

Note that the total time recorded by tqdm is 00:00 whereas the actual execution takes more than one second.

I have no idea at which end the error is caused but I could not reproduce the issue in tqdm by reporting progress from within a generator function and adding delays from the consumer of the generator, nor by extracting the delays into a background process and pausing the main process. Which kind of magic is luigi here that rule out tqdm's time management? If you have any ideas, it would be great to share them with them tqdm community!

LinqLover avatar Nov 04 '20 17:11 LinqLover

looks like either a bug or misunderstanding in how luigi works, not tqdm.

-from time import sleep
+from time import sleep, time
...
-        for i in tqdm(range(10)):
-            yield Eggs(str(i))
+        with tqdm(range(10)) as bar:
+            start = time()
+            for i in bar:
+                yield Eggs(str(i))
+                print(bar.format_dict['elapsed'], "should be approx", time() - start)
  1. manually measuring time matches tqdm's internal measure
  2. print() gets executed 1+2+...+10 = 55 times (!) - I would've expected only 10 executions

casperdcl avatar Nov 04 '20 21:11 casperdcl

Ah, thanks for the observation, @casperdcl! I realize that luigi evaluates the run() method entirely from the beginning until a not-yet-resolved dependency is met. So it is no wonder that the time measurement does not work because it always represents a short snapshot only ... @luigi, would there be any chance to revise this confusing behavior or would this be out of scope?

LinqLover avatar Nov 05 '20 10:11 LinqLover

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

stale[bot] avatar Jan 09 '22 01:01 stale[bot]