pandera
pandera copied to clipboard
Optional import hypotheses doesn't install hypothesis
Location of the documentation
https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#usage-in-unit-tests https://pandera.readthedocs.io/en/stable/index.html#extras
Documentation problem
When reading about how to create example dataframes from schemas on this page it mentions that you need hypothesis library.
Skipping to the index page/README to understand if that's an optional dependency seems to indicate it is, though there's some confusion over spelling hypothesis vs hypotheses.
This lead me to get ModuleNotFoundError: No module named 'hypothesis' errors until I dug further and discovered that
calling pip install pandera[hypotheses] only actually installs scipy, and you have to install pandera[strategies] to get hypothesis.
Suggested fix for documentation
I would suggest including hypothesis library with the pandera[hypotheses] optional install and fixing the discrepancy in spelling, or specifying in the strategy page that you need to install pandera[strategies] for this functionality.
This is an unfortunate naming collision.
When I added Hypothesis checks to the codebase I didn't anticipate ever using the hypothesis library for data synthesis.
pip install pandera[hypotheses] unlocks hypothesis checks.
pip install pandera[strategies] unlocks data synthesis strategies, which uses hypothesis.
To clarify this it would make sense to add the appropriate pip install commands in the corresponding pages:
- https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#usage-in-unit-tests:
pip install pandera[strategies] - https://pandera.readthedocs.io/en/stable/hypothesis.html:
pip install pandera[hypotheses]
Would you be able to make a PR for the docs updates?
@rmetcalfe-msp any thoughts on the docs solution to clarify this behavior?
@cosmicBboy thanks for the explanation, sounds reasonable. I'll try address in a PR when I have some time.