pandapy
pandapy copied to clipboard
Question: Why faster until 50k to 500k rows?
For most pandas functions, I expected numpy to outperform regardless of data size. I'm curious about the technical details behind this observation. Any information would be appreciated.
Thanks!
@bscully27 > For most pandas functions, I expected numpy to outperform regardless of data size.
I'm curious about the technical details behind this observation. Any information would be appreciated.
Thanks!
Numpy is the fastest because it is C-compiled and stores data of same datatype (homogeneous arrays) and you get the benefits of principle of locality i.e., tendency of a processor to access the same set of memory locations repetitively over a short period of time. On the other hand pandas are flexible to store data of many datatypes which in turn decrease its performance.