jupyter-cache
                                
                                 jupyter-cache copied to clipboard
                                
                                    jupyter-cache copied to clipboard
                            
                            
                            
                        How should/would the cache be used remotely?
Originally posted by @choldgraf in https://github.com/ExecutableBookProject/jupyter-cache/pull/6#issuecomment-590100257
Maybe a use-case to consider here.
A team has a really big book, it takes 2 hours to complete. An author forks the book, clones it locally, edits one page. They want to contribute the page back. A few questions:
- Do they need to run the entire 2-hour build process locally before seeing what the page looks like? --> seems like this could be handled by letting cache execution step be configurable only to specific files
- When they make a PR, does the entire book need to re-build top to bottom on the CI/CD job? --> here the cache could probably be stored as a build artifact in a CI/CD job independent of the .gitrepository
- Is there any way for a "master cache" to be bundled with the book?
- If so, then is that a pattern we want to encourage?
 
I could see a benefit of committing the cache, in the sense that then git would keep track of changes to the cache and diffs to the pages would propagate through github, clones, etc. However, I worry about a few things:
- The cache would probably become gigantic for non-trivial projects, unless it could be incrementally-updated and have some kind of "shallow clone" behavior.
- It would require sub-moduling a book repository, so I think it would only work for fairly power-users.
- The cache diffs themselves would be binary (I think?) so they wouldn't make any sense in github which would make it hard to know what has changed in the cache.
Our Python lectures take around 1.5 hours to build from scratch, so this is our scenario.
For 99% of our PRs, we just make the edits in RST, generate the ipynb for that one page and then run it manually to see if it looks OK. This is fine for most edits, which typically adjust language or tweak code.
If we're concerned about how this looks in the PDF, say, we generate that one page locally. Sometimes RAs will include an image in the PR to show that the PDF looks fine.
These are imperfect systems but they work OK for the most part. So my vote would be for us to favor simplicity, at least initially, but not committing the cache. (Plus, I'm a reasonably sophisticated user, but submodules still confuse me. My instinct is to fear and distrust them.)
@jstac you can never trust two things: politicians, and sub-modules.
I wonder if one potential way to address this would involve meeting another use-case: building single-page documents. If we make the CLI easy for building the HTML or PDF of one page and letting users quickly preview what it looks like, the same machinery could be re-used for people that only want to build a single page and not an entire book...
Yep, that seems like a good idea. Two birds with one stone, etc. And the single-page use case is certainly important.
Such tools are available in jupinx for reviewing edits to QE lectures. I suppose cross references involving other pages won't work. But, for 99% of cases, it's perfectly fine.
Glad you have mentioned this @jstac. It will be really important to support rendering of single pages for usability. We currently do this using environment variable FILES= and passing that through to SPHINX. I agree the CLI tool needs to cater to this and make it easier :-)
An approach I was playing around with for the jupyter book CLI was to use jupyter-book page: https://jupyterbook.org/features/page.html
perhaps we could use the same pattern, but also allow for PDF output with a kwarg or something?
As discussed with @mmcky, jupinx currently uses a static cache, housed in the Sphinx _build folder on an Amazon server. The build is persisted for all execution triggers (cron jobbed every hour), which run a 'git-pull' then sphinx-build. For this use case, the (just merged) hash implementation of jupter-cache should work fine.
@mmcky also noted that their current (sphinx based) cache implementation doesn't work on Travis CI; presumably because the cache is compressed/un-compressed, changing the file mtime's that sphinx uses to determine re-builds (matching to a dictionary stored in the pickled environment object). This wouldn't be an issue for jupyter-cache since it is hash based.
It would also be interesting to think how it might work with GitHub actions, CircleCI and ReadTheDocs builds.
Just a note to self, in case this is issue is encountered (sqlite on NFS): jupyter/notebook#1782
Another related note: for jupyter book I was starting to collect a repository with several CI/CD patterns that could be used to deploy books: https://github.com/choldgraf/jupyter-book-deploy-demo
I think it'd be helpful if we replicated that repository for the new build system, ideally with multiple levels of complexity that users may want (e.g. vanilla build w/o execute then host online, execute and build, and execute+cache and build