JUNE icon indicating copy to clipboard operation
JUNE copied to clipboard

Memory optimization: consider using Python array rather than list in some places

Open valeriupredoi opened this issue 4 years ago • 14 comments

Hey guys! So I went through a lot of code recently, running all sorts of ideas and trial and error tryouts and I am happy to say that I think there really isn't much more room for speedup in serial mode. Take for instance this tree I looked at yesterday - all elementary computations I looked are of O(1e-5 s) which I think it's the best one can get from something that's not a builtin Python function; I mean, a lot of the builtin functions are slower than that actually, depends what you use. There are some things that @sadielbartholomew and myself are still a bit quizzical like statistics but I didn't see those being used in the main time loop. So, unless we manage to parallelize it with mpi or Pool, I honestly can't see anything else major to speed up the serial run. Am sure @sadielbartholomew will find another ace, but I can't think of anything else unless I understand and change the workflow in detail (which I can't and don't want to since that would be silly :grin: )

Having said that, I think there is room to improve the memory consumption. One thing I can think of on top of my head is to use Python arrays instead of lists when you have long lists and keep appending to them - Python arrays are not as efficient with memory than Numpy arrays but they are heaps faster to append to (about 75% slower to append to compared to lists, orders faster than np.append or np.concatenate) but they are 4-5 times lighter on the memory than lists. Do you think this would be something good to do? If so would it be possible to point me to bits of the code where this can be done so I can start testing? Cheers :beer:

valeriupredoi avatar Aug 13 '20 12:08 valeriupredoi