ipycache icon indicating copy to clipboard operation
ipycache copied to clipboard

Semi-automatic caching of outputs by input hash.

Open mforbes opened this issue 10 years ago • 4 comments

I would like to be able to cache the output of several different cells. The problem is that each cache file seems to store exactly one output so that I must use a different file for each cell, which is cumbersome.

Ideally, I would like to be able to first specify a cache file (not sure the optimal syntax) like:

%%cache --set-cache "my_cache.pkl"

Then cache several output cells with something as simple as:

%%cache
!hg sum

Another cell might be

%%cache
from my_module import f
from line_profiler import LineProfiler
profile = LineProfiler()
profile.add_function(f)
profile.run('f()')
profile.print_stats()

Ideally, each output would be stored in a dictionary who's key is a hash of the cell's input so that the cell is executed if needed, but -- more importantly -- the appropriate output is restored if one has several cells.

This mechanism would be defeated by cells that have identical inputs:

%%cache
!time

but I could live with that. (To get around this, some sort of context of the cell in the original notebook would be needed, but I think this might be a can of worms.)

Backwards Compatibility

I have not delved into the source yet, so I am not sure how easy this would be to implement, however, I propose as a minimal syntax that this feature would only work if one first specifies a cache file with --set-cache "my_cache.pkl", otherwise the default behaviour continues. Only if this has been set would a blank %%cache line work as described (otherwise, the usual error would be raised).

Is this feasible, or is there a better way to "freeze" the output of calculations?

Michael.

P.S. My use case is interactive profiling and improving code. I want to freeze the previous profiling outputs so that the notebook becomes a log of the profiling process. I don't yet see much of a need for caching variables, but need to cache the output, so perhaps this extension is not the best fit for my needs, but it almost works.

mforbes avatar Nov 25 '13 00:11 mforbes

Hi Michael,

It looks interesting and doable. I think it would be better to implement that in a new cell magic like %%globalcache or any better name. It would be confusing to have different behaviors with the same magic, depending on the previous occurence of something like --set-cache my_cache.pkl.

Otherwise, I think this magic could work exactly as you describe:

  • First, need to set the cache path with %%globalcache --set-cache my_cache.pkl
  • Then, the variables + output of any %%globalcache cell is saved in this file, indexed by the hash of the cell input
  • When loading a cached cell, the variables + output are loaded from the file based on the hash.

When multiple cells have exactly the same contents, would it be conceivable for you to put a comment like # First cell or # Second cell to disambiguate between the cells' hashes?

rossant avatar Nov 25 '13 09:11 rossant

I am not sure that using %%cache in this way would be confusing given that the current usage requires the file as an argument. (My suggestion follows the behaviour of %%px where one can configure --targets etc., so this idea of having a default state is not unfamiliar to at least some IPython users). But, I am fine either way.

It is quite a common strategy (at least for me!) to use comments to make cells of identical code look different, so I think that is perfectly feasible.

mforbes avatar Nov 27 '13 00:11 mforbes

It's a bit different from %%px, isn't it? %%px --targets only changes the target for the current command. To change the default behavior, one needs to call %pxconfig. (http://ipython.org/ipython-doc/stable/parallel/magics.html)

I imagine we could have %cacheconfig --set-cache and subsequent %%cache var1 var2 as usual.

rossant avatar Dec 03 '13 11:12 rossant

Of course you are correct. To be completely analogous one should probably also be able to do something like %%cache --cache mycache.py var1 var2, then set the default cache with %cacheconfig --cache mycache.py. Then %cachconfig could also set other defaults like --cachedir.

mforbes avatar Dec 03 '13 19:12 mforbes