opteryx
opteryx copied to clipboard
Phase out numpy
trafficstars
numpy was great as a way to get a lot of good performance code quickly. It allowed us to write an okay performance engine in Python relatively quickly.
But we're now paying for this,
- We have to keep checking if we have pyarrow or numpy arrays, add complexity and weird edge cases.
- We have to convert between the two formats, at about 10m rows per second, it's measurable in some instances.
- Numpy has a confusing type system, which uses
objectfar too often.
We should use arrow arrays as our memory format, simplifying logic and handling to improve performance.
There will be some actions we cannot compete with numpy for it's performance, and we may still use for specific functions, but not as an interchange format.