opteryx icon indicating copy to clipboard operation
opteryx copied to clipboard

Phase out numpy

Open joocer opened this issue 3 months ago • 0 comments
trafficstars

numpy was great as a way to get a lot of good performance code quickly. It allowed us to write an okay performance engine in Python relatively quickly.

But we're now paying for this,

  • We have to keep checking if we have pyarrow or numpy arrays, add complexity and weird edge cases.
  • We have to convert between the two formats, at about 10m rows per second, it's measurable in some instances.
  • Numpy has a confusing type system, which uses object far too often.

We should use arrow arrays as our memory format, simplifying logic and handling to improve performance.

There will be some actions we cannot compete with numpy for it's performance, and we may still use for specific functions, but not as an interchange format.

joocer avatar Aug 05 '25 17:08 joocer