pyopenjtalk
pyopenjtalk copied to clipboard
Speed up by releasing gil and using openmp with zerocopy view
This pr did:
- absolute import
- update cython language level to 3
- zerocopy view with more typing to speedup
- cython inline suport to speedup
- cython gc-free class to reduce gc pressure (obviously it's not possible for those types to participate in cycles)
- use
nogil
to allow run in threadpool and bypass gil - use openmp for parallel compute
- add ci to build wheels
- inject data files to wheels so there's no need to download it when import
- reformat using isort and black
This may fix #30 #33 #34 for removing necessity to build it natively.
but I think there is too much for a single PR
The change seems huge but can be concluded with optimize cython and packaging
.
what problems you are going to solve
They are in the first comment.
how much performance gain is obtained
@synodriver will give you a result.
how much performance gain is obtained
Since there is no gil required for those functions, it depends on how many cores your cpu have. And for asyncio apps, just use loop.run_in_executor
is okey.
BTW, I think cython should be an optional requirement for this project, just including those cython-generated files is ok if a user don't have cython installed.
I used to include cython-generated files for cython-dependent projects, but it indeed causes problems in some environments due to a mismatch between Python/Cython/compiler versions. I wish cython-generated code was portable and worked fine on any platform, but it wasn't unfortunately. In addition, to be specific to this repo, users need to generate config.h depending on their platform (ref https://github.com/r9y9/pyopenjtalk/issues/21), so it is not easy to make cython dependency optional for any platform.
One possible solution would be to release both pre-built packages for specific platforms (cython dep: optional) and non-built packages for general purposes (cython dep: mandatory), respectively.
One possible solution would be to release both pre-built packages for specific platforms (cython dep: optional) and non-built packages for general purposes (cython dep: mandatory), respectively.
This is surely what we would like to do. We can both reduce the installing complexity and remain its portability by that.
Actually I have already build wheels for almost all platforms here, it took a long time to debug the CI.
Ok, let's add some checks before inject datafiles.
This is surely what we would like to do. We can both reduce the installing complexity and remain its portability by that.
Agreed. I'll upload wheels to pypi for the next release so that users can install pyopenjtalk more easily. Also thanks for debugging through CI. I understand it takes time.