pandapy icon indicating copy to clipboard operation
pandapy copied to clipboard

Question: Why faster until 50k to 500k rows?

Open bscully27 opened this issue 4 years ago • 1 comments

For most pandas functions, I expected numpy to outperform regardless of data size. I'm curious about the technical details behind this observation. Any information would be appreciated.

Thanks!

bscully27 avatar Mar 25 '20 13:03 bscully27

@bscully27 > For most pandas functions, I expected numpy to outperform regardless of data size.

I'm curious about the technical details behind this observation. Any information would be appreciated.

Thanks!

Numpy is the fastest because it is C-compiled and stores data of same datatype (homogeneous arrays) and you get the benefits of principle of locality i.e., tendency of a processor to access the same set of memory locations repetitively over a short period of time. On the other hand pandas are flexible to store data of many datatypes which in turn decrease its performance.

yash-clear avatar Dec 28 '20 20:12 yash-clear