David Lecomber

Results 41 comments of David Lecomber

@mr-c - we may not be using your PR correctly. Whereas with my sse2neon, we see .. ``` Important parameter settings: BATCH_SIZE: 512 MAX_SEQ_LEN_REF: 256 MAX_SEQ_LEN_QER: 128 MAX_SEQ_LEN8: 128 SEEDS_PER_READ:...

showing we're taking wrong branch: dslarm/bwa-mem2: ![Image](https://github.com/user-attachments/assets/7224b20f-6f4a-4cc6-aa40-0d4885882842) mr-c/bwamem-2: ![Image](https://github.com/user-attachments/assets/cb36aea7-5088-4919-83ff-f51a99e92199) turns out that we need to use '-D__SSE2__' CXXFLAGS to use the vectorized work rather than doing the scalar path. that...

@BiocondaBot please fetch artifacts

* build is now merged and running with sse2neon, conda packages are available. * correctness: linux-aarch64, osx-arm64 and linux-x86 all get exact same checksum for creating an index file and...

Performance data per actual bioconda built packages and test case per above comment, on a c8g.24xlarge instance. At 16 worker threads: * SIMDe: 112 seconds ; 89 seconds [* see...

Also: I considered trying LLVM or using Arm's ACfL compiler (24.10.1) which is an LLVM derivative. Building outside of Conda, relative runtimes (lower is better) for ACfL vs GCC11 and...

Some comparative data vs AMD Genoa (c7a.24xlarge) vs AWS Graviton4 (c8g.24xlarge). * AWS Graviton4 * conda + gcc14: * 16 threads: 85 seconds * 32 threads: 48.2 seconds * AMD...

GCC 10.x in bioconda is only used for x86 - it's GCC 14 (?) for aarch64. Reason for that x86 problem is that GCC > 10 has its own definitions...

Thanks @matsen - awesomely quick work. The '387' branch link above couldn't be reached - have you pushed it to github or is it hidden somehow\? I don't know how...

The CI fail is not viewable to ordinary github users.