hpack icon indicating copy to clipboard operation
hpack copied to clipboard

Stateless Huffman

Open jimcarreer opened this issue 10 years ago • 12 comments

As discussed with @Lukasa the Huffman encoder / decoder is currently initialized using two static structures defined in huffman_constants.py. We could potentially remove the need for this pseudo-stateful nature and instead have a completely stateless Huffman encoder / decoder with the addition of another static structure related to decoding (encoding is currently trivial to make stateless). This may also offer a minor performance enhancement.

jimcarreer avatar Jul 08 '15 21:07 jimcarreer

I think I'm going to take a look at this, this weekend.

jimcarreer avatar Oct 13 '15 14:10 jimcarreer

An FYI, I've migrated to an essentially stateless Huffman decoding implementation in #35.

Lukasa avatar Mar 21 '16 12:03 Lukasa

So this should be closed?

thedrow avatar Dec 12 '16 09:12 thedrow

Nope, we still have a stateful encoder.

Lukasa avatar Dec 12 '16 09:12 Lukasa

I can try to write it in C if that helps.

thedrow avatar Dec 12 '16 10:12 thedrow

I would like to avoid us having any native extensions in this module at this time.

Lukasa avatar Dec 12 '16 10:12 Lukasa

Even optionally?

thedrow avatar Dec 12 '16 10:12 thedrow

Tentatively, yes, I'm opposed to having an optional native extension.

My thinking on this is a bit complex, so let me lay it out.

  1. Generally speaking I have a policy regarding C code, which is that wherever possible I'd like to avoid having untrusted input reach C code. The risks of running C code on untrusted input are too high, and I find that generally speaking I and most software developers I work with are bad at spotting places where assumptions are made that data will be "reasonable". That makes C a great vector for bugs, and in particular for the kind of bug that does not affect a pure-Python program.

    For that reason, if we were going to write a native extension I'd want it to be in Rust, not C. However,

  2. The wheel ecosystem is not quite there yet. While it's now mostly possible to ship binary wheels for most platforms, there are a number of Linux distributions that cannot use manylinux1 wheels. This is a problem if you want to ship a Rust extension, because it's very difficult to indicate the need for a Rust compile chain on a random Linux system.

    This problem also affects C extensions, by the way: one of the systems that doesn't support a manylinux1 wheel is Alpine, which uses musl libc. That means that if we shipped a C extension on that platform, we'd require the installation of a C toolchain for users to install the module: not great.

  3. So the only option is, as you said, to make the module optional: we'd ideally want to check whether we could compile it and, if we couldn't, we'd simply fall back to pure-Python code.

    I have extremely mixed feelings about this, most of them not good. In particular, it's a bit of a hairy debugging issue: what code the user is executing becomes conditional on difficult-to-introspect features of their platform; we double the number of code paths that need to be tested and debugged; and we double the workload for feature addition and API changes.

The TL;DR here, I think, is that right now I'm about -0.25 on having a native extension. If we could come up with a nice Rust extension then I'd reconsider, but I'd really rather not maintain a C extension if it can possibly be avoided.

Lukasa avatar Dec 12 '16 11:12 Lukasa

In fact, let state an outline of what we'd need to make me happy here.

If any contributor can write a Rust extension that can be optionally compiled if a buildchain is present, that flags its presence or absence clearly in a module-scoped dunder variable, that uses CFFI, and that doesn't vomit a load of output into the terminal on install on systems that do not have a Rust toolchain present, I'd be willing to consider merging such an extension.

Lukasa avatar Dec 12 '16 11:12 Lukasa

It shouldn't be much of a problem with https://github.com/novocaine/rust-python-ext

thedrow avatar Dec 12 '16 11:12 thedrow

I've already written a Rust extension module in Python, that's not the hard bit. The harder bits are a) making it fail gracefully, b) keeping it maintained, and c) actually writing the extension. ;)

Lukasa avatar Dec 12 '16 12:12 Lukasa

Right, this might take some effort since I'm not very familiar with Rust's lifelines and borrow semantics. I will give it a shot though.

thedrow avatar Dec 12 '16 12:12 thedrow