David Lecomber
David Lecomber
You're right about the difference between the read and fread, but fread still has overhead. On some platforms that's partially due to locks (arbitrating multiple threads potentially reading the buffer)...
Good idea - but it doesn't change the performance. You should be able to see this with this quick benchmark: compile up and just run (will read /dev/zero). ` #include...
Do you need anything more from me - as this is my first PR to the project, I need a maintainer to approve.
Hi - @lh3 are you able to merge this or give me commit rights to merge it?
We can get good performance on AWS Graviton3 using just sse2neon (https://github.com/DLTcollab/sse2neon) or SIMDe (https://github.com/simd-everywhere/simde) translations of the x86 intrinsics (https://community.arm.com/arm-community-blogs/b/high-performance-computing-blog/posts/aws-graviton3-reduces-time-and-cost-for-genomics). So, at least initially that's a good outcome without...
Looks good to me from changes perspective, but still should verify what the benefit is and that it's no worse on x86_64 as bwa-mem2 is such a significant CPU time...
Here are the aarch64 numbers from the existing conda package and the artefacts of this PR. This is best-of 4 runs at each thread count (8, 16, 32, 64). There...
This change looks good - I'd say it's worth merging as is, it's a real improvement. Complete data follows for a test based on short read human g1k_v37 and a...
(Edited) Scratch that - I thought the new artefacts were with zlib-ng.. I see it was just making sure you had the last commit in the build.. Platform | 8...
Super - I see the update for 1.1.1 is now in bioconda - and many package dependencies are now nicer (unrestricted or updated). Where is the tensorflow 2.16 limit coming...