Cuda Chen comments

Results 63 comments of


                                            Cuda Chen

Support for ARMv8 in 32-bit execution mode

> > Thanks for the information, I think the so-called "ARMv8 in 32-bit execution mode" should refer to A32 instruction set. > > Then, you can add A32 specific items...

Support for ARMv8 in 32-bit execution mode

Close this as it is completed in https://github.com/DLTcollab/sse2neon/pull/620.

Running on gpu

To my knowledge, it runs on CPU only no matter what.

Running on gpu

@gd1925 It is possible to let dlib and OpenCV function run on GPU. However, the bus to GPU will be the bottleneck. Unless you have a particular reason to run...

gcc sanitizer fails on _mm_loadu_si128

Hi @romange , > does vld1q_s32 require alignment? if yes, then this seem to contradict semantics of _mm_loadu_si128. According to the [document](https://developer.arm.com/architectures/instruction-sets/intrinsics/vld1q_s32), `vld1q_s32` may generate `LD1 {Vt.4S},[Xn]`. What's more, GCC...

gcc sanitizer fails on _mm_loadu_si128

Hi @aqrit, Based on my experiment, the `vld1q_u8` runs slightly slower. ## Test Code ```c FORCE_INLINE __m128i old_mm_loadu_si128(const __m128i *p) { return vreinterpretq_m128i_s32(vld1q_s32((const int32_t *) p)); } FORCE_INLINE __m128i new_mm_loadu_si128(const...

Cuda Chen

Support for ARMv8 in 32-bit execution mode

Support for ARMv8 in 32-bit execution mode

Running on gpu

Running on gpu

gcc sanitizer fails on _mm_loadu_si128

gcc sanitizer fails on _mm_loadu_si128

proposal: add SLIC superpixel algorithm

proposal: add SLIC superpixel algorithm

proposal: add SLIC superpixel algorithm

gcc sanitizer fails on _mm_loadu_si128