pantab icon indicating copy to clipboard operation
pantab copied to clipboard

Update pantab to be compatible with HyperAPI v0.0.14567

Open vogelsgesang opened this issue 3 years ago • 18 comments

The interface of the hyper_rowset_chunk_field_values function changed in the latest version of HyperAPI. Update pantab to be compatible with that HyperAPI version. While at it, also improve the error message to include the exact version of HyperAPI

Fixes #145

vogelsgesang avatar Mar 25 '22 17:03 vogelsgesang

Hello @vogelsgesang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 114:89: E501 line too long (96 > 88 characters)

Comment last updated at 2022-09-06 17:21:40 UTC

pep8speaks avatar Mar 25 '22 17:03 pep8speaks

@vogelsgesang this is huge - thanks! We probably need to configure CI to have testing of both older versions and the latest version

WillAyd avatar Mar 25 '22 17:03 WillAyd

For instance this may fix it for the latest version but then wouldn't work with older versions (at least I think from looking at it)

WillAyd avatar Mar 25 '22 17:03 WillAyd

currently, this commit is written without backwards compatibility. Do we care about it enough to complicate the code?

The error message already states

pantab is incompatible with version {} of Tableau Hyper API. Please upgrade both tableauhyperapi and pantab to the latest version. If doing so does not fix this issue, please file an issue at https://github.com/innobi/pantab/issues mentioning the exact pantab and HyperAPI versions which triggered this error.

Afaict, following that guidance should unblock all users

vogelsgesang avatar Mar 25 '22 17:03 vogelsgesang

It's a little thorny. The problem is definitely going to be on the packaging side to enforce this without really muddying pip installs.

One last option is instead of dynamic runtime link to distribute the hyper lib / executable as part of pantab, which prevents the need for any runtime check. This is what I've done in HyperArrow as the tooling there improved a lot since this package was originally incepeted

https://github.com/innobi/hyperarrow/blob/main/.github/workflows/build_python_wheels.yml

WillAyd avatar Mar 25 '22 17:03 WillAyd

can we please pin `black` and `isort` to exact versions in the `environment.yml`? It's really annoying that I have to reformat many files which my commit didn't even touch, just because black & isort changed their formatting rules. Pretty bad pull request experience. Thankfully, I know how to unblock myself, but this experience could easily demotivate other contributors...

vogelsgesang avatar Mar 25 '22 17:03 vogelsgesang

Yep also a great callout. I think using docker-compose and creating a lint rule will help a lot for that

WillAyd avatar Mar 25 '22 17:03 WillAyd

One last option is instead of dynamic runtime link to distribute the hyper lib / executable as part of pantab

That approach won't work for pantab given that pantab needs to use the exact same version of HyperAPI as also installed via pip. Otherwise, passing a Connection from Python to the C extension (e.g. to frame_from_hyper_query) would not work reliably. Afaict, the current architecture of HyperArrow won't allow for such "connection passing" functionality.

vogelsgesang avatar Mar 25 '22 17:03 vogelsgesang

It's a little thorny. The problem is definitely going to be on the packaging side to enforce this without really muddying pip installs.

Good point. I changed changed the setup.py to restrict the HyperAPI version

-    install_requires=["pandas", "tableauhyperapi", "numpy"],
+    install_requires=["pandas", "tableauhyperapi>=0.0.14567", "numpy"],

According to the Python packaging user guide, this should ensure that HyperAPI is recent enough. Not sure if it works on all package managers (conda, anaconda, ...), though. I have no experience with those package managers

vogelsgesang avatar Mar 25 '22 17:03 vogelsgesang

That's a good point for re-used connections. One way to offer the same functionality (in a breaking manner) would be to have read / write function that can optionally return a connection. That way everything at the binary level is totally managed within pantab and you don't even necessarily need tableauhyperapi or runtime linkage.

Hard to say what is the least evil going forward. Dynamic linkage over a C or C++ API that isn't stable can cause a lot of headaches with version compatibility that I think most Python users aren't going to really get

WillAyd avatar Mar 25 '22 17:03 WillAyd

Thankfully, I know how to unblock myself

Scratch that. It seems I am unable to get past

ERROR: /home/runner/work/pantab/pantab/pantab/__init__.py Imports are incorrectly sorted and/or formatted. ERROR: /home/runner/work/pantab/pantab/pantab/_reader.py Imports are incorrectly sorted and/or formatted. ERROR: /home/runner/work/pantab/pantab/pantab/_writer.py Imports are incorrectly sorted and/or formatted.

I can't reproduce this issue locally when running isort **/*.py -c and have no glue what the CI is complaining about

vogelsgesang avatar Mar 25 '22 17:03 vogelsgesang

Ok, I am giving up on the formatting issues...

Either disable the isort check temporarily and merge this commit as is or feel free to modify the commit however you see fit to fulfill your formatting requirements

vogelsgesang avatar Mar 25 '22 17:03 vogelsgesang

also, you might want to consider to make formatting an independent step in your pipeline, such that the CI at least still provides feedback wether the functional test cases pass, even if the formatting is incorrect

vogelsgesang avatar Mar 25 '22 17:03 vogelsgesang

According to the Python packaging user guide, this should ensure that HyperAPI is recent enough. Not sure if it works on all package managers (conda, anaconda, ...), though. I have no experience with those package managers

Just for your awareness setup.py is used only by setuptools. For quite some time anaconda and setuptools were the two main methods for building and in effect distributing, with tools like flint and poetry amongst others having come around more recently. The growth of these tools and desires to clean up distribution / installation has yielded way to non-setup.py specifications like pyproject.toml, which was introduced in PEP 518 and extended to a bit in PEP 621 to cover what we might need here, like install dependencies

With that said, I know for a fact that pip and setuptools in coordination never had a real dependency resolver. Adding a version pin in setup.py for this project makes it so when you install this tool it will check your other dependencies, but never really after unless you try to reinstall the tool. Nothing with setup.py can truly resolve dependencies in a robust manner

conda goes about this in a different way by isolating an environment and having projects be much more explicit about their dependencies, while also adding in a real dependency solver that makes sure all conditions can be satisfied when installin into an environment. Unfortunately this project is not installable via conda as it requires in its current state the tableauhyperapi, which itself is not in conda. When declaring dependencies in conda your upstream dependencies also need to be available in that ecosystem, so that's unfortuantely not an option

We could do pyproject.toml and see if that helps for build systems, but I'm not if that ever really prevents this. Historically with binary dependencies you would either using conda (not an option) or build a wheel that includes a copy of those libraries. I think the latter of those two is still going to be our best bet

WillAyd avatar Mar 25 '22 20:03 WillAyd

One other solution here I can think of is to make this a runtime check. We would want a compat object at both the Python and C layers and can branch in the code depending on version of the tableauhyperapi is being passed. That requires the least amount of architectural change, with a risk of having a bunch of spaghetti compat code built out over time. Definitely worth exploring

WillAyd avatar Mar 25 '22 23:03 WillAyd

If you want to rebase / merge on #147 should help with the linting. Broke out into a separate job as you suggested. Also you can locally now just run docker compose run --rm lint and it will do what needs to be done from a black / isort perspective

WillAyd avatar Mar 26 '22 00:03 WillAyd

I've tried doing the compatability approach in #152 which compiles but fails at runtime. I am not sure if that will be a feasible approach but I've opened an SO article out there to see if someone with more expertise in C can help

https://stackoverflow.com/questions/71630729/provide-compatability-for-various-function-signatures-from-shared-library

WillAyd avatar Mar 26 '22 18:03 WillAyd

Can you add a note in the documentation with a matrix of version support? We can release pantab 3.0 with this, so something where pantab 3.0 works for 0.0.14567 and greater and 2.0 supports only prior.

That should help clarify, and we can even point to that in the error message when incompatabilities are detected

WillAyd avatar Mar 29 '22 19:03 WillAyd

Thanks @vogelsgesang

WillAyd avatar Sep 06 '22 18:09 WillAyd

Thanks for pushing this over the finish line, @WillAyd 👍

vogelsgesang avatar Sep 06 '22 18:09 vogelsgesang