Cuda Chen

Results 63 comments of Cuda Chen

> > Thanks for the information, I think the so-called "ARMv8 in 32-bit execution mode" should refer to A32 instruction set. > > Then, you can add A32 specific items...

Close this as it is completed in https://github.com/DLTcollab/sse2neon/pull/620.

To my knowledge, it runs on CPU only no matter what.

@gd1925 It is possible to let dlib and OpenCV function run on GPU. However, the bus to GPU will be the bottleneck. Unless you have a particular reason to run...

Hi @romange , > does vld1q_s32 require alignment? if yes, then this seem to contradict semantics of _mm_loadu_si128. According to the [document](https://developer.arm.com/architectures/instruction-sets/intrinsics/vld1q_s32), `vld1q_s32` may generate `LD1 {Vt.4S},[Xn]`. What's more, GCC...

Hi @aqrit, Based on my experiment, the `vld1q_u8` runs slightly slower. ## Test Code ```c FORCE_INLINE __m128i old_mm_loadu_si128(const __m128i *p) { return vreinterpretq_m128i_s32(vld1q_s32((const int32_t *) p)); } FORCE_INLINE __m128i new_mm_loadu_si128(const...

@johnnychen94 I think I can give it a try :) By the way, will [prune_segments](https://github.com/JuliaImages/ImageSegmentation.jl/blob/b4ea277ce0988bb9b7108fd082c15eb2cfe1bfb9/src/core.jl#L196) be a good start of enforce connectivity?

OK, I will realize how the original implementation does on enforce connectivity. And we can watch how [PSMM](https://github.com/PSMM/SLIC-Superpixels/blob/master/slic.cpp#L216) does on enforce connectivity, too.

After reading your comment, I have to admit my SLIC implementation in Julia **have reference** laixintao's [work](https://github.com/laixintao/slic-python-implementation). I have added the reference in my Julia SLIC repo as a good...

> should be fixed by #632 ? Hi @aqrit , Though I am busy for work these weeks, I will arrange myself a time for testing #632.