minimp3
minimp3 copied to clipboard
Further optimize L3_huffman and L3_imdct36
ARM Instructions profile:
Total executed instructions: 2164536044
L3_huffman.isra.2 685600678 31.674%
mp3d_synth 546698880 25.257%
L3_imdct36 251612240 11.624%
L3_dct3_9 176638976 8.161%
mp3d_DCT_II 165811968 7.660%
mp3d_synth_pair 61793280 2.855%
L3_antialias 48054640 2.220%
L3_change_sign 36160512 1.671%
L3_midside_stereo 27845120 1.286%
get_bits 27395265 1.266%
memset 26390774 1.219%
mp3d_scale_pcm 21970944 1.015%
__memcpy_neon 19825566 0.916%
L3_ldexp_q2 17661260 0.816%
L3_read_scalefactors 14160764 0.654%
L3_decode_scalefactors 10988038 0.508%
L3_huffman and L3_imdct36+L3_dct3_9 needs optimizations. (Vectorize two L3_dct3_9?)
After some experiments:
- There no good enough ARM compiler to completely rely on intrinsics and do not use assembler.
- Best known compiler - armcc, then clang and gcc8 (close to clang with -flto, clang crashes with -flto).
- Main compiler problem found - bad post increment usage. https://reviews.llvm.org/D39415