Andrew

Results 724 comments of Andrew

There is no specific code for Apple's desktop-to-be processor over there. As far as internets tell - there is ISA profile present already. ``` Instruction set A64 – ARMv8.4-A ```...

Thats not prototype code, nor intrimsic header of sorts, that is an earlu attempt to document an undocumented co-processor.

If anyone could benchmark rosetta2 roughly and tell what works best from x86 world https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment#3616843

That emulated x86 will be around for 3-5 years (looking at "smooth" ppc to x86 transition years ago)

How much of dlib call is consumed by sgemm? What integer parameters get passed to it? For NDK - do you use clang or gcc-based? Are you certain you build...

callgrind shows lots of pthread actions steming from dlib

At least from 0.3.5 the compiler flags were adjusted by senior ARM emplyee, I'd trust him to know best ways around their CPU designs. Another option is to run "make"...

There is heavy thread manipulations from dlib side. It does not happen with android. You can use 'perf' as quick non-intrusive profiler, with strace ltrace gprof next

Also there is clang ( gfortran for fortran) that would bring environment closer to good one

... which more or less comes from unnecessary parallelism for small samples given to blas functions .... @tjoli can you run `perf record sample` `perf report` With and without `OPENBLAS_NUM_THREADS=1...