cobs-python icon indicating copy to clipboard operation
cobs-python copied to clipboard

Don't build C source if running on PYPY

Open imwhocodes opened this issue 1 year ago • 4 comments

Avoid building C optimised version if running on PYPY and use the plain python implementation

  1. pypy have a penalty cost at the python to C interface
  2. I found this memory leak: https://github.com/pypy/pypy/issues/5100

As implementation I took inspiration from msgpack https://github.com/msgpack/msgpack-python/blob/main/setup.py

Alternative is to use import cobs.cobs._cobs_py as cobs instead of from cobs import cobs but this move the burden to the userspace and it is pretty ugly

Thanks, Luca

imwhocodes avatar Nov 04 '24 11:11 imwhocodes

Unfortunately the performance hit from using pure-Python version is massive:

import timeit
import os
if "PURE_PYTHON" in os.environ:
    from cobs.cobs import _cobs_py as cobs
else:
    from cobs import cobs

with open("/vmlinuz", "rb") as f:
    ddata = f.read()
    edata = cobs.encode(ddata)

number = int(os.environ.get("ITERS", "100"))
dtime = timeit.timeit(lambda: cobs.encode(ddata), number=number) / number / len(ddata)
etime = timeit.timeit(lambda: cobs.decode(edata), number=number) / number / len(edata)

print(f"encoding: {1 / (dtime * 1e9)} GB/s")
print(f"decoding: {1 / (etime * 1e9)} GB/s")
$ pdm run python --version
Python 3.13.3
$ pdm run benchmark.py 
encoding: 1.1086298500843723 GB/s
decoding: 2.100975460564308 GB/s
$ ITERS=10 PURE_PYTHON=1 pdm run benchmark.py 
encoding: 0.015219794224444787 GB/s
decoding: 0.04777418311304811 GB/s
$ pdm run --venv pypy python --version
Python 3.11.11 (7.3.19+dfsg-2, Apr 11 2025, 02:57:20)
[PyPy 7.3.19 with GCC 14.2.0]
$ pdm run --venv pypy benchmark.py 
encoding: 0.6770020925247543 GB/s
decoding: 0.8494924667477181 GB/s
$ ITERS=10 PURE_PYTHON=1 pdm run --venv pypy benchmark.py 
encoding: 0.013618624366727946 GB/s
decoding: 0.013457611599891738 GB/s

For many applications, such as mine, this would make the package effectively useless.

whitequark avatar May 10 '25 05:05 whitequark

I wonder if it would be possible to set up the package so that the C extension is an installation option.

I'm only vaguely aware of a possible mechanism: in the qrcode package, it mentions the ability to install it with an optional dependency:

For more image functionality, install qrcode with the pil dependency so that pillow is installed and can be used for generating images:

pip install "qrcode[pil]"

cmcqueen avatar Jul 18 '25 04:07 cmcqueen

I wonder if it would be possible to set up the package so that the C extension is an installation option.

I'm only vaguely aware of a possible mechanism: in the qrcode package, it mentions the ability to install it with an optional dependency:

As far as I'm aware, this mechanism can't be used in the way you suggest.

Ultimately, I think it's OK to leave the C extension to be built when installing on PyPy: it does improve performance (if I recall correctly by a factor of about 2x) despite the fact that on PyPy, the Python C API bridge is relatively slow and the use of CFFI is preferred.

whitequark avatar Jul 18 '25 17:07 whitequark

I would prefer to solve this the "proper way", but I don't know what the proper way is. I've looked at this several times over the years. It is hard to come up with a solution that satisfies all use-cases automatically. Every use-case can be worked out manually, but automatic is preferred.

Some projects in the past have written the setup.py to attempt building a C extension, and fall back to installing the pure Python version. But these days it seems Python is preferring to move away from bespoke setup.py installation methods.

I could split the cobs package into two separate packages, something like cobs_pure and cobs_ext. But that still doesn't provide a mechanism for automatically selecting the ideal package when installing dependencies.

cmcqueen avatar Jul 20 '25 01:07 cmcqueen