python-zstandard
python-zstandard copied to clipboard
Intent to rewrite in Rust
After learning Rust and seeing its obvious benefits for safety/security and ease of maintenance over C, I would prefer to not author any new C code wherever possible.
I believe implementing this Python extension in Rust instead of C has long-term benefits to the maintainability of the project.
Once #109 lands and Python 2.7 is no longer supported, my intent is to start rewriting C code in this project in Rust. (Getting Rust to work with Python 2.7 on Windows is a massive plain and effectively blocks writing Python extensions in Rust.)
I'm filing this issue toprovide a sounding board for people to raise concerns/support for this initiative.
I hereby raise concern that Rust is PITA to many people, starting with the fact that it isn't supported on many architectures (coverage is much smaller than zstandard use on Gentoo right now), and ending with the fact that its upstream still haven't managed to reach an agreement with LLVM upstream and instead keep using outdated forks of LLVM.
I'm working under the assumption that most people will install this Python package via pip install zstandard
. And the overwhelming majority of those people will install a pre-built binary wheel published in PyPI (https://pypi.org/project/zstandard/#files). People installing Python binary wheels will have a pre-built shared library / extension and Rust doesn't enter the picture at all.
The only people that need to be burdened by the existence of Rust are:
- Developers of this project
- Packagers of this project
- People who can't install a pre-built binary wheel (perhaps they are running a more esoteric OS/platform)
1 is basically just me.
I'm very empathetic towards end-users who can't install pre-built binary wheels. For them, I'll need to make the code in setup.py
as solid as possible. Or maybe we give up on Rust and fall back to the CFFI backend if Rust isn't "obviously" available.
I have mixed feelings towards packagers. I want to support them fully. My assumption is the state of Rust packaging is solid enough in 2020 that this shouldn't be a problem. (Firefox is using Rust heavily and nearly everyone can package Firefox...) But packagers irk me in that they often make their job 10x harder by imposing arbitrary rules on themselves. So I anticipate packagers potentially taking issue with this change because it makes their lives harder. I dunno.
Do you have a specific objection, @mgorny?
You're entirely missing group 4., users of source-based distributions that get all their packages built locally.
As for binary wheels, I have very bad experience with them on non-x86 architectures. I can't imagine anyone using binary wheels, say, on ARM where you have to support a lot of different processors. Yeah, building C without a second machine to cross-compile is not easy but it's possible. I've seen pip install packages (I think it was on SPARC) that segfaulted right away — and I honestly doubt somebody built binary packages for that architecture, so either pip installed something wrong entirely or failed to build it properly from source.
Oh, and CFFI would certainly be a better solution. Probably fast enough too.
IMHO this project has a high degree of completion, if make some changes, maybe it can be included into Python's stdlib, which is written by C.
Some people hope that Python's stdlib includes a zstd module, see issue37095. (I'm not CPython's core developer)
its obvious benefits for safety/security and ease of maintenance over C
IMHO, In the long run, this project will be stable at some point, then these will not be a problem.
The main
branch now has a Rust backend passing all tests! You can build + test it by doing something like:
python setup.py --rust-backend build_ext -i
PYTHON_ZSTANDARD_IMPORT_POLICY=rust pytest
There are currently a number of caveats.
The Rust backend was mostly implemented using the CFFI backend as a source, so certain patterns from the C backend may not be preserved. This includes releasing of the GIL.
The Rust backend doesn't yet perform 0-copy in many call sites, so performance is likely slower than the C and even CFFI backends.
There are intermittent fuzzing test failures in CI due to timeouts (possibly due to current performance overhead of Rust backend).
The multi_compress_to_buffer
and multi_decompress_to_buffer
implementations are grossly inefficient and likely much slower than the C implementation.
We do not yet include the Rust backend in built wheels.
There hasn't been an extensive auditing of the code: there are likely bugs lingering.
Before I call the Rust backend stable, I plan to extensively audit the Rust code against the C code and fix any performance/correctness gaps.
So we've been revisiting Rust recently because of cryptography, and it's still a horrible choice. It's still far from mature, officially supports only x86 and arm64. A few of the platforms that python-zstandard currently runs on are not supported at all (and would require tons of work which the few volunteers simply can't afford to do), the majority that are unofficially supported are very hard to bootstrap and upstream simply ignores requests to help with that.
So we'd really appreciate some formal backup plan for us. Something better than 'remove python-zstandard from most of your supported platforms' or 'keep patching the old version yourself'.