Pweave icon indicating copy to clipboard operation
Pweave copied to clipboard

Evaluate and Cache new Code Chunks in Documentation Mode

Open brandonwillard opened this issue 10 years ago • 13 comments

If I add a new chunk after the previous chunks are cached, I get the following exception:

Pweave -f texminted -c -d missing_chunk_test.texw
Traceback (most recent call last):
  File "/usr/local/bin/Pweave", line 9, in <module>
    load_entry_point('Pweave==0.23', 'console_scripts', 'Pweave')()
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/scripts.py", line 53, in weave
    pweave.weave(infile, **opts_dict)
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/__init__.py", line 69, in weave
    doc.weave(shell)
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/pweb.py", line 141, in weave
    self.run(shell)
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/pweb.py", line 109, in run
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/processors.py", line 53, in run
    success = self._getoldresults()
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/processors.py", line 260, in _getoldresults
    executed.append(self._oldresults[i].copy())
IndexError: list index out of range
Makefile:14: recipe for target 'missing_chunk_test.tex' failed
make: *** [missing_chunk_test.tex] Error 1

I was assuming that the caching mechanism would notice the missing chunk, evaluate and cache it, then proceed. Is that the intended functionality?

brandonwillard avatar Feb 04 '15 17:02 brandonwillard

I have the same problem, please fix this. The way it works now is that i have to cache all chunks again with Pweave -f texminted -c %.texw when adding a new chunk

sgi3 avatar Mar 09 '15 00:03 sgi3

I don't have time to work on this at the moment. I agree that the implementation is not ideal, you're welcome to submit a pull request if you have a suggestion on how to fix it.

Note that Pweave only caches input and output text and not Python objects, so if new chunks need the data from old ones there is no easy fix to this problem.

mpastell avatar Mar 31 '15 20:03 mpastell

Gotcha. I’ve been making some small changes toward those ends, so—hopefully—I’ll have a pull request for you. ​

On Tue, Mar 31, 2015 at 3:34 PM, Matti Pastell [email protected] wrote:

I don't have time to work on this at the moment. I agree that the implementation is not ideal, you're welcome to submit a pull request if you have a suggestion on how to fix it.

Note that Pweave only caches input and output text and not Python objects, so if new chunks need the data from old ones there is no easy fix to this problem.

— Reply to this email directly or view it on GitHub https://github.com/mpastell/Pweave/issues/19#issuecomment-88236098.

brandonwillard avatar Mar 31 '15 20:03 brandonwillard

Seems like one could simply bypass caching in documentation mode and use the caching magic in an IPython processor. A subclass of PwebIPythonProcessor that loads the extension and adds the magic before the self.IPy.run_* statements might do the trick.

brandonwillard avatar Mar 29 '16 23:03 brandonwillard

Has there been any activity on this?

I'd really appreciate chunk-level caching functionality, which seems like it would be closely related. My use case: I have an increasingly long document with more and more pweave-generated figures, where I'd like to only have to recompile the one I'm currently working on.

Thanks for creating pweave! It's encouraged me to plot more graphs, which is always good :-)

scfrank avatar Aug 24 '17 00:08 scfrank

I've been slowly taking a shot at improved caching (see here), but progress has been slow due to multiple competing interests. Namely, a desire to

  • fold inline chunks into the general chunk framework,
  • provide multi-line chunk options,
  • provide generalized caching
    • e.g. naive output-only caching that considers changes in buffer content/source and chunk settings,
  • make everything work almost entirely within the Jupyter ecosystem
    • every chunk evaluation engine is necessarily a Jupyter kernel
    • use of nbformat as the underlying parsed document format,
  • and provide precision Python-only caching
    • bytecode-aware caching, via the mechanics behind the with hack given here.

brandonwillard avatar Sep 09 '17 20:09 brandonwillard

@brandonwillard Those are multiple big changes that you are talking about. Please don't submit them as one pull request, but split it into separate ones.

Note:

  • Every chunk evaluation engine is already a Jupyter kernel
  • I don't see the benefit of using nbformat as the parsed document format, you can already use it for output.

I suggest you first do:

  • fold inline chunks into the general chunk framework
  • provide generalized caching e.g. naive output-only caching that considers changes in buffer content/source and chunk settings,

I have decided not to allow multi-line chunk options as it breaks editor support and I haven't seen a compelling need for it. If you can up with a proper implementation with tests I can accept it, but put it as separate pull request.

mpastell avatar Sep 10 '17 10:09 mpastell

Oh, sorry, I hadn't done that work with a PR in mind; it was just a test branch that started with caching and turned into all sorts of stuff. If there's an interest in those latter two goals, I can separate them and make PRs. As for the nbformat idea, I can start an issue discussing my reasons.

brandonwillard avatar Sep 10 '17 18:09 brandonwillard

@brandonwillard how were you thinking of implementing save_chunk_state https://github.com/mpastell/Pweave/compare/master...brandonwillard:caching-changes#diff-2747ccbd23b5ea3c1c42eb01071e5a6eR166

fgregg avatar Apr 13 '18 03:04 fgregg

Ah, yeah, I left off with the idea of incrementally pickling the session in _[save|load]_chunk_state. This idea isn't all that efficient/feasible without, perhaps, an incremental approach.

At around the same time, I was experimenting with a more granular, variable-level caching that uses code/ASTs extracted from with bodies and had intended to port this idea instead of using (incremental) session caching.

Regardless, I've gone full org-mode nowadays, so I don't know when I'll get time to jump back into this!

brandonwillard avatar Apr 13 '18 04:04 brandonwillard

Thanks @brandonwillard.

fgregg avatar Apr 13 '18 14:04 fgregg

@brandonwillard, both of the approaches you considered seem particular to python. Currently, it looks like Pweave is trying to not be tied to Python by using Jupyter to allow different kernels. Do you know if Jupyter kernel managers have a language-independent means to serialize the state of a kernel?

Stack Overflow seems to suggest no

fgregg avatar Apr 13 '18 15:04 fgregg

Yeah, I think that any non-naive caching (e.g. more than just caching output and validating against source text differences) is necessarily language-specific.

However, it seems like more than a few popular languages have straight-forward runtime bytecode tools, AST generation and — at the very least — introspection capabilities. As with Python, it's possible to implement a less naive caching with those.

Regarding Jupyter, it would be fantastic to see an abstraction of bytecode and/or AST objects exposed by the client protocol. The project has a somewhat related idea in its instrospection messages. Otherwise, one can always implement smart caching at the kernel level and use custom messages.

brandonwillard avatar Apr 14 '18 00:04 brandonwillard