stdpain
stdpain
patch [d29228f](https://github.com/kiyo-masui/bitshuffle/pull/140/commits/d29228fae82b51dec5bddc92c9b2c8ac1a0e402d) also works.
@sebpop It seems we could have a better implements for neonmovemask_bulk https://gist.github.com/geofflangdale/99393863c8cae3e83195a5e592e7dc82 ``` uint8x16_t t0 = vbslq_u8(vdupq_n_u8(0x55), p0, p1); // 01010101... uint8x16_t t1 = vbslq_u8(vdupq_n_u8(0x55), p2, p3); // 23232323... uint8x16_t...
> @sebpop It seems we could have a better implements for neonmovemask_bulk https://gist.github.com/geofflangdale/99393863c8cae3e83195a5e592e7dc82 uint8x16_t t0 = vbslq_u8(vdupq_n_u8(0x55), p0, p1); // 01010101... uint8x16_t t1 = vbslq_u8(vdupq_n_u8(0x55), p2, p3); // 23232323... uint8x16_t...
@miguelgilmartinez https://github.com/StarRocks/starrocks/pull/32754
> > @miguelgilmartinez #32754 > > None of our columns is so big. hash join will splice a binary column into a large binary column. if the input rows are...
Is there a stable reproducible case, I need to confirm the real performance improvement.
Any performance for SSB/TPC-H/TPC-DS 100G - 1000G scenarios. For operations like move_mask ARM performance is poor. We have adapted the NEON instruction set in our hotspot code, and in gravtion...
These hotspot codes are highly inline, and using .so calls breaks the inline and may introduce performance degradation
> At this point, yield_ctx.need_yield=true, the defer defined in L226 will not be executed, and _running_restore_tasks will never decrease. in next schedule turn _running_restore_tasks will be decreased. > In the...
https://github.com/Mergifyio rebase main