jodie icon indicating copy to clipboard operation
jodie copied to clipboard

Python 3 compatibility and t-batch caching.

Open jpalowitch opened this issue 4 years ago • 2 comments

The first 3 commits address python 3 compatibility and remove unnecessary imports.

The final commit is an incomplete tbatching optimization. We don't need to recompute tbatches for every epoch, so it makes sense to do some type of caching. Also, we don't need to recompute them for every run either, assuming we can load the entire dict of tbatches into memory and do random access on each dict (needed to account for user changes to timespan).

However, the current code isn't set up to incorporate these changes easily, because chunks of t-batches are computed on-the-fly, trading off with the corresponding chunk of the epoch. So one would have to compute the "start" and "end" points of each tbatch chunk so that the epoch chunk can access the right tbatches.

The code in the 4th commit is not just unoptimized, but buggy. In the first epoch, the tbatch dicts keep growing, as args.cache_tbatches=True removes tbatch reinitialization. But the epoch still iterates over the full length of the tbatch dicts. There are a two competing ways one could fix this:

  1. Revert to reinitializing the tbatch chunk every time, but save the tbatch chunks to disk.

  2. Tell the epoch where to access the tbatch dict, instead of starting from the beginning.

2 seems easier, but I will leave that to the code maintainers' judgement :)

jpalowitch avatar Mar 31 '20 07:03 jpalowitch

Hi @jpalowitch, I can help finishing&testing cached t-batch feature as synced-up with Prof. Kumar last week, are you still interested/in-need of it?

pmixer avatar Jul 07 '20 03:07 pmixer

Hi @jpalowitch, I can help finishing&testing cached t-batch feature as synced-up with Prof. Kumar last week, are you still interested/in-need of it?

Yes, thanks!

jpalowitch avatar Jul 08 '20 16:07 jpalowitch