multielo
multielo copied to clipboard
Performance issues
Hey @djcunningham0. First, congrats for the amazing repo / project!
I'm opening this issue to discuss some improvements in the process_data
algorithm.
I run a dataset of ~ 70k rows (matches) and it takes >140 minutes to finish.
Maybe we can do some changes to speed up things? Maybe use Numba?
Insights:
https://python.plainenglish.io/a-solution-to-boost-python-speed-1000x-times-c9e7d5be2f40
https://towardsdatascience.com/how-to-make-your-pandas-operation-100x-faster-81ebcd09265c