Update pantab to be compatible with HyperAPI v0.0.14567
The interface of the hyper_rowset_chunk_field_values function changed
in the latest version of HyperAPI. Update pantab to be compatible with
that HyperAPI version. While at it, also improve the error message to
include the exact version of HyperAPI
Fixes #145
Hello @vogelsgesang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
pantab/__init__.py:
Line 114:89: E501 line too long (96 > 88 characters)
Comment last updated at 2022-09-06 17:21:40 UTC
@vogelsgesang this is huge - thanks! We probably need to configure CI to have testing of both older versions and the latest version
For instance this may fix it for the latest version but then wouldn't work with older versions (at least I think from looking at it)
currently, this commit is written without backwards compatibility. Do we care about it enough to complicate the code?
The error message already states
pantab is incompatible with version {} of Tableau Hyper API. Please upgrade both
tableauhyperapiandpantabto the latest version. If doing so does not fix this issue, please file an issue at https://github.com/innobi/pantab/issues mentioning the exact pantab and HyperAPI versions which triggered this error.
Afaict, following that guidance should unblock all users
It's a little thorny. The problem is definitely going to be on the packaging side to enforce this without really muddying pip installs.
One last option is instead of dynamic runtime link to distribute the hyper lib / executable as part of pantab, which prevents the need for any runtime check. This is what I've done in HyperArrow as the tooling there improved a lot since this package was originally incepeted
https://github.com/innobi/hyperarrow/blob/main/.github/workflows/build_python_wheels.yml
Yep also a great callout. I think using docker-compose and creating a lint rule will help a lot for that
One last option is instead of dynamic runtime link to distribute the hyper lib / executable as part of pantab
That approach won't work for pantab given that pantab needs to use the exact same version of HyperAPI as also installed via pip. Otherwise, passing a Connection from Python to the C extension (e.g. to frame_from_hyper_query) would not work reliably. Afaict, the current architecture of HyperArrow won't allow for such "connection passing" functionality.
It's a little thorny. The problem is definitely going to be on the packaging side to enforce this without really muddying pip installs.
Good point. I changed changed the setup.py to restrict the HyperAPI version
- install_requires=["pandas", "tableauhyperapi", "numpy"],
+ install_requires=["pandas", "tableauhyperapi>=0.0.14567", "numpy"],
According to the Python packaging user guide, this should ensure that HyperAPI is recent enough. Not sure if it works on all package managers (conda, anaconda, ...), though. I have no experience with those package managers
That's a good point for re-used connections. One way to offer the same functionality (in a breaking manner) would be to have read / write function that can optionally return a connection. That way everything at the binary level is totally managed within pantab and you don't even necessarily need tableauhyperapi or runtime linkage.
Hard to say what is the least evil going forward. Dynamic linkage over a C or C++ API that isn't stable can cause a lot of headaches with version compatibility that I think most Python users aren't going to really get
Thankfully, I know how to unblock myself
Scratch that. It seems I am unable to get past
ERROR: /home/runner/work/pantab/pantab/pantab/__init__.py Imports are incorrectly sorted and/or formatted. ERROR: /home/runner/work/pantab/pantab/pantab/_reader.py Imports are incorrectly sorted and/or formatted. ERROR: /home/runner/work/pantab/pantab/pantab/_writer.py Imports are incorrectly sorted and/or formatted.
I can't reproduce this issue locally when running isort **/*.py -c and have no glue what the CI is complaining about
Ok, I am giving up on the formatting issues...
Either disable the isort check temporarily and merge this commit as is or feel free to modify the commit however you see fit to fulfill your formatting requirements
also, you might want to consider to make formatting an independent step in your pipeline, such that the CI at least still provides feedback wether the functional test cases pass, even if the formatting is incorrect
According to the Python packaging user guide, this should ensure that HyperAPI is recent enough. Not sure if it works on all package managers (conda, anaconda, ...), though. I have no experience with those package managers
Just for your awareness setup.py is used only by setuptools. For quite some time anaconda and setuptools were the two main methods for building and in effect distributing, with tools like flint and poetry amongst others having come around more recently. The growth of these tools and desires to clean up distribution / installation has yielded way to non-setup.py specifications like pyproject.toml, which was introduced in PEP 518 and extended to a bit in PEP 621 to cover what we might need here, like install dependencies
With that said, I know for a fact that pip and setuptools in coordination never had a real dependency resolver. Adding a version pin in setup.py for this project makes it so when you install this tool it will check your other dependencies, but never really after unless you try to reinstall the tool. Nothing with setup.py can truly resolve dependencies in a robust manner
conda goes about this in a different way by isolating an environment and having projects be much more explicit about their dependencies, while also adding in a real dependency solver that makes sure all conditions can be satisfied when installin into an environment. Unfortunately this project is not installable via conda as it requires in its current state the tableauhyperapi, which itself is not in conda. When declaring dependencies in conda your upstream dependencies also need to be available in that ecosystem, so that's unfortuantely not an option
We could do pyproject.toml and see if that helps for build systems, but I'm not if that ever really prevents this. Historically with binary dependencies you would either using conda (not an option) or build a wheel that includes a copy of those libraries. I think the latter of those two is still going to be our best bet
One other solution here I can think of is to make this a runtime check. We would want a compat object at both the Python and C layers and can branch in the code depending on version of the tableauhyperapi is being passed. That requires the least amount of architectural change, with a risk of having a bunch of spaghetti compat code built out over time. Definitely worth exploring
If you want to rebase / merge on #147 should help with the linting. Broke out into a separate job as you suggested. Also you can locally now just run docker compose run --rm lint and it will do what needs to be done from a black / isort perspective
I've tried doing the compatability approach in #152 which compiles but fails at runtime. I am not sure if that will be a feasible approach but I've opened an SO article out there to see if someone with more expertise in C can help
https://stackoverflow.com/questions/71630729/provide-compatability-for-various-function-signatures-from-shared-library
Can you add a note in the documentation with a matrix of version support? We can release pantab 3.0 with this, so something where pantab 3.0 works for 0.0.14567 and greater and 2.0 supports only prior.
That should help clarify, and we can even point to that in the error message when incompatabilities are detected
Thanks @vogelsgesang
Thanks for pushing this over the finish line, @WillAyd 👍