Pweave
Pweave copied to clipboard
Evaluate and Cache new Code Chunks in Documentation Mode
If I add a new chunk after the previous chunks are cached, I get the following exception:
Pweave -f texminted -c -d missing_chunk_test.texw
Traceback (most recent call last):
File "/usr/local/bin/Pweave", line 9, in <module>
load_entry_point('Pweave==0.23', 'console_scripts', 'Pweave')()
File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/scripts.py", line 53, in weave
pweave.weave(infile, **opts_dict)
File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/__init__.py", line 69, in weave
doc.weave(shell)
File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/pweb.py", line 141, in weave
self.run(shell)
File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/pweb.py", line 109, in run
runner.run()
File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/processors.py", line 53, in run
success = self._getoldresults()
File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/processors.py", line 260, in _getoldresults
executed.append(self._oldresults[i].copy())
IndexError: list index out of range
Makefile:14: recipe for target 'missing_chunk_test.tex' failed
make: *** [missing_chunk_test.tex] Error 1
I was assuming that the caching mechanism would notice the missing chunk, evaluate and cache it, then proceed. Is that the intended functionality?
I have the same problem, please fix this. The way it works now is that i have to cache all chunks again with Pweave -f texminted -c %.texw when adding a new chunk
I don't have time to work on this at the moment. I agree that the implementation is not ideal, you're welcome to submit a pull request if you have a suggestion on how to fix it.
Note that Pweave only caches input and output text and not Python objects, so if new chunks need the data from old ones there is no easy fix to this problem.
Gotcha. I’ve been making some small changes toward those ends, so—hopefully—I’ll have a pull request for you.
On Tue, Mar 31, 2015 at 3:34 PM, Matti Pastell [email protected] wrote:
I don't have time to work on this at the moment. I agree that the implementation is not ideal, you're welcome to submit a pull request if you have a suggestion on how to fix it.
Note that Pweave only caches input and output text and not Python objects, so if new chunks need the data from old ones there is no easy fix to this problem.
— Reply to this email directly or view it on GitHub https://github.com/mpastell/Pweave/issues/19#issuecomment-88236098.
Seems like one could simply bypass caching in documentation mode and use the caching magic in an IPython processor. A subclass of PwebIPythonProcessor
that loads the extension and adds the magic before the self.IPy.run_*
statements might do the trick.
Has there been any activity on this?
I'd really appreciate chunk-level caching functionality, which seems like it would be closely related. My use case: I have an increasingly long document with more and more pweave-generated figures, where I'd like to only have to recompile the one I'm currently working on.
Thanks for creating pweave! It's encouraged me to plot more graphs, which is always good :-)
I've been slowly taking a shot at improved caching (see here), but progress has been slow due to multiple competing interests. Namely, a desire to
- fold inline chunks into the general chunk framework,
- provide multi-line chunk options,
- provide generalized caching
- e.g. naive output-only caching that considers changes in buffer content/source and chunk settings,
- make everything work almost entirely within the Jupyter ecosystem
- every chunk evaluation engine is necessarily a Jupyter kernel
- use of nbformat as the underlying parsed document format,
- and provide precision Python-only caching
- bytecode-aware caching, via the mechanics behind the
with
hack given here.
- bytecode-aware caching, via the mechanics behind the
@brandonwillard Those are multiple big changes that you are talking about. Please don't submit them as one pull request, but split it into separate ones.
Note:
- Every chunk evaluation engine is already a Jupyter kernel
- I don't see the benefit of using nbformat as the parsed document format, you can already use it for output.
I suggest you first do:
- fold inline chunks into the general chunk framework
- provide generalized caching e.g. naive output-only caching that considers changes in buffer content/source and chunk settings,
I have decided not to allow multi-line chunk options as it breaks editor support and I haven't seen a compelling need for it. If you can up with a proper implementation with tests I can accept it, but put it as separate pull request.
Oh, sorry, I hadn't done that work with a PR in mind; it was just a test branch that started with caching and turned into all sorts of stuff. If there's an interest in those latter two goals, I can separate them and make PRs. As for the nbformat idea, I can start an issue discussing my reasons.
@brandonwillard how were you thinking of implementing save_chunk_state
https://github.com/mpastell/Pweave/compare/master...brandonwillard:caching-changes#diff-2747ccbd23b5ea3c1c42eb01071e5a6eR166
Ah, yeah, I left off with the idea of incrementally pickling the session in _[save|load]_chunk_state
. This idea isn't all that efficient/feasible without, perhaps, an incremental approach.
At around the same time, I was experimenting with a more granular, variable-level caching that uses code/ASTs extracted from with
bodies and had intended to port this idea instead of using (incremental) session caching.
Regardless, I've gone full org-mode nowadays, so I don't know when I'll get time to jump back into this!
Thanks @brandonwillard.
@brandonwillard, both of the approaches you considered seem particular to python. Currently, it looks like Pweave is trying to not be tied to Python by using Jupyter to allow different kernels. Do you know if Jupyter kernel managers have a language-independent means to serialize the state of a kernel?
Yeah, I think that any non-naive caching (e.g. more than just caching output and validating against source text differences) is necessarily language-specific.
However, it seems like more than a few popular languages have straight-forward runtime bytecode tools, AST generation and — at the very least — introspection capabilities. As with Python, it's possible to implement a less naive caching with those.
Regarding Jupyter, it would be fantastic to see an abstraction of bytecode and/or AST objects exposed by the client protocol. The project has a somewhat related idea in its instrospection messages. Otherwise, one can always implement smart caching at the kernel level and use custom messages.