nbval
nbval copied to clipboard
Getting error for timestamps, matplotlib figures and pandas DataFrames
This a great plugin. I'm getting unexpected errors for the following types of cells:
- timestamps

- matplotlib figures

- pandas dfs

Are there workarounds for all latter cell types such as an "ignore flag", and are there potential code integrations for any of the above types in future?
Many thanks.
Hi par,
If you add the comment:
#PYTEST_VALIDATE_IGNORE_OUTPUT
at the beginning of a cell, it will not run the validation for that - so this is good for things like timestamps.
With matplotlib, it's failing because it only compares the text output, and because this includes a memory location here, on two separate runs, we'd expect it to be different. One problem is that matplotlib figures can change even with the same input - they're not easy to compare directly. For example, the default colormaps are changing between the current and next version of Matplotlib, which would make the tests fail despite nothing really being wrong. At the moment, it's better to just compare the data that makes the plot, rather than the plot itself.
Do you have an example notebook with a dataframe we can test out and investigate? It's odd that that one failed...
Pandas dataframes are definitely not something we had considered. Anything other than unicode text is going to be hard to compare and will likely need coding on a case-by-case basis.
Comparing matplot lib plots based on their raw data is a nice idea @ryanpepper and we could implement this in future.
Yeah, the problem is that we can't even do anything like hash the image that was produced, because a change in matplotlib version would change the image... any tiny version difference in Matplotlib, even a minor one would likely change the image enough that the test would fail.
Agreed. Comparing bitmaps is not robust enough.
I'll try and take a look at the Pandas dataframe stuff today, it might be more simple than it seems - it looks like something weird is happening with the comparison.
Can you give me write access to the repository Oli?
Comparison of matplotlib figures is certainly a feature that would be incredibly useful because it's such a common use case. I'm not sure of the best approach, though.
At the risk of slightly going off-topic I'm just going to do some braindumping here of various things that I have been thinking about in the past. I'd be curious to hear other people's opinions.
First of all, it would be nice to have at least the option of bitmap comparison in nbval (potentially using something like perceptualdiff with a given threshold).
If a matplotlib figure is stored in a suitable format internally (e.g. SVG) this might allow for more powerful comparison because it may be possible to report the difference in a more conceptual way, e.g. "the y-axis shifted by 3 pixels but everything else stayed the same". Also, I have written (and seen) tests before that compare matplotlib.Figure objects directly instead of exported images (using the internal structure of the figure rather than any output, i.e. how it is composed of axes, labels, etc.). Obviously this isn't possible in a notebook because the matplotlib object is gone, but I wonder if longer-term something like this could be a better approach for robust comparison of plots. All of these approaches (apart from perhaps perceptualdiff) are definitely outside the scope of nbval, though, and should potentially be discussed with the matplotlib developers instead.
A related idea is what the (highly recommended) Holoviews project are doing. One of their key ideas is that the visualisation is conceptually separate from the underlying data. In particular, the data itself is always stored inside the class that performs the visualisation and is therefore still accessible. For testing in nbval this could allow us to (i) compare the underlying data, and in addition (ii) check that the type of visualisation used is still the same. This would lead to a successful comparison even if something minor changed in the output. Btw, the Holoviews developers use Jupyter notebooks for their tests and I know that in the past they had some trouble with robust comparisons and making the workflow smooth. It might be useful to drop into their Gitter channel and discuss things with them, they are a very nice and responsive bunch.
Finally, it would be fantastic to integrate nbval with something like ApprovalTests (see here for the Python version). The workflow I have in mind is that approvaltests calls nbval to compare a notebook and if a comparison fails (e.g. because an output image changed) then it would fire up something like nbdime to allow for a quick check of whether the change should be rejected as a failure or should be approved as the new "master" version to compare against in future tests. I'm not actually sure whether this would require much change in nbval at all - and it may just be a matter of hooking it up with approvaltests in the right way. Also, this approach could be nice for easy visual comparison of changes in pandas.DataFrames and other complicated structures.
Anyway, lots of random thoughts. Bottom line is: I think that focusing on the underlying data for comparison instead of any output (e.g. bitmaps) generated from the data is probably the most robust route in the long run.
I've been playing around with testing and DataFrames for the last hour, and I can't reproduce an error anything like yours @par2 - the only times I've had failing tests with Pandas are where a df function also spits out a plot. If you could send me an example, it would be really helpful. I'm wondering if it's something to do with the styling options, which I don't have experience of using myself.
For checking mpl plots, Jess Hamrick wrote a tool called plotchecker. That needs access to the mpl plot object, which is tricky since that exists in the kernel process, not the test process, but it might still be possible to use it somehow.
@par2 there is also the --sanitize-with option which you can use to provide a file with regular expressions (such as
[regex1]
regex: \d{1,2}/\d{1,2}/\d{2,4}
replace: DATE-STAMP
[regex2]
regex: \d{2}:\d{2}:\d{2}
replace: TIME-STAMP
which allows to tell nbval to ignore certain patterns such as dates and times (and you could use that for memory addresses as well).
You may prefer this over the #PYTEST_VALIDATE_IGNORE_OUTPUT option as it still checks the rest of the cell output.
See also the (sparse) documentation in https://github.com/computationalmodelling/nbval/blob/master/documentation.ipynb
Thanks everyone for the fast responses. Much of your feedback has confirmed my suspicions, particularly with the mpl errors. Thanks also for the workarounds for timestamp cells @ryanpepper and @fangohr.
@ryanpepper I can give you the exact notebook I used to generate this error. If you clone the master branch from my repo (currently lamana v0.4.11) into a conda test env, you can run nbval on the docs/showcase.ipynb notebook.
For for anyone interested, I've included shell commands of the virtual environment I used to reproduce these errors (Windows 7, 64-bit machine).
$ git clone -b master https://github.com/par2/lamana.git
$ cd lamana
$ conda create -n nbvaltest python=3.5 numpy pandas matplotlib
$ activate nbvaltest
$ pip install nbval pytest
$ pip install -e .
$ cd docs
$ py.test --nbval showcase.ipynb
^ ping @ryanpepper were you able to reproduce the error for DataFrames?
I just recreated an nbvaltest env using all your above commands, and checked out your tag 0.4.11, and oh boy does that throw errors on me, like hundreds (possibly looped?):
Don't know what it all means, I'm also just checking the available tools for notebook tests, but it all seems still too vague and/or incomplete currently, also regarding coverage calculations.
Hi @Michaelaye - the above commands for the nbvaltest environment is a user module installation and test which was having problems. If you want to try out nbval, I'd advise cloning this repository and having a look at the notebooks inside it, which should work as expected.
Best wishes, Ryan
Sure your delivered notebooks work as expected, but when evaluating new packages for their usefulness I also look at how easy (or if at all) problems are fixable. ;)
The "Kernel died before replying to kernel info" error looks like there's a problem with the Jupyter set up you're using.
Maybe because above suggested test env only includes this:?
conda create -n nbvaltest python=3.5 numpy pandas matplotlib
No, my fully setup conda env shows the same errors, and I don't have any other jupyter probs there?
It will need jupyter_client to test with, but I think you'd get a different error if that wasn't installed. Do you have a kernelspec set up? Can you check what happens when you run this code:
from jupyter_client.manager import start_new_kernel
start_new_kernel(extra_arguments=['--matplotlib=inline'])
i get indeed the same error:
In [1]: from jupyter_client.manager import start_new_kernel
In [2]: start_new_kernel(extra_arguments=['--matplotlib=inline'])
/Users/klay6683/miniconda3/bin/python: Error while finding spec for 'IPython.kernel' (ImportError: No module named 'IPython')
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-2-0f2916c54c07> in <module>()
----> 1 start_new_kernel(extra_arguments=['--matplotlib=inline'])
/Users/klay6683/miniconda3/envs/stable/lib/python3.5/site-packages/jupyter_client/manager.py in start_new_kernel(startup_timeout, kernel_name, **kwargs)
431 kc.start_channels()
432 try:
--> 433 kc.wait_for_ready(timeout=startup_timeout)
434 except RuntimeError:
435 kc.stop_channels()
/Users/klay6683/miniconda3/envs/stable/lib/python3.5/site-packages/jupyter_client/blocking/client.py in wait_for_ready(self, timeout)
57
58 if not self.is_alive():
---> 59 raise RuntimeError('Kernel died before replying to kernel_info')
60
61 # Check if current time is ready check time plus timeout
RuntimeError: Kernel died before replying to kernel_info
What do you get from jupyter kernelspec list?
wow, some stone-age settings that I don't use since long, as I'm using nb_conda_kernels for managing those...
(stable) └─❱❱❱ jupyter kernelspec list +7371 14:34 ❰─┘
Available kernels:
python2 /Users/klay6683/Library/Jupyter/kernels/python2
python3 /Users/klay6683/Library/Jupyter/kernels/python3
If you remove those (in particular the python3 one), does it work? I don't think nbval is affected by nb_conda_kernels.
However: both kernels point actually to working conda envs ( because i always use the same envs names). The python3 one points to conda root though.
yep, both the start_new_kernel and the nbval test now go through. Thanks! But what happened though? The path to the python3 kernel was correct, was there maybe an old setting in the kernel descriptor structure that crashed it?
Possibly. Or maybe it didn't have matplotlib installed in that env - the --matplotlib=inline option makes it try to load matplotlib on start.
The latter, as I don't have a full env in conda root, only using it for package management. Thanks for your debugging help!
I've opened #20 to avoid relying on matplotlib