numexpr icon indicating copy to clipboard operation
numexpr copied to clipboard

NE3: Try BLOCKSIZE in bytes rather than elements

Open robbmcleod opened this issue 7 years ago • 0 comments

In NE2 loops were unrolled, which resulted in a strong preference for a BLOCK_SIZE in terms of elements rather than bytes. This meant that the BLOCK_SIZE was generally optimized relative to the L1 cache for float64. As NE3 now uses vectorization, we don’t see a performance difference between fixed-length loops versus not, so this functionality has already been commented out (which reduces compilation time significantly). Therefore it may further make sense to refactor BLOCK_SIZE in terms of bytes.

Steps to complete: 1.) ideally the item size can be embedded in the function itself, by the code_generator? Or, 2.) insert a section above the #include interp_body.cpp macros that calculates the appropriate block size in elements from the CACHE_SIZE. 3.) The ability to change the CACHE_SIZE should ideally be included as an argument to setup.py.

robbmcleod avatar Mar 14 '17 03:03 robbmcleod