noodles
noodles copied to clipboard
noodles implementation is slower than rust-htslib, did I do something suboptimal?
rust-htslib:
cat 0.00s user 0.01s system 1% cpu 0.568 total
sudo cargo run --release -- > tandem_reads.bam 0.43s user 0.04s system 82% cpu 0.570 total
noodles:
cat 0.00s user 0.01s system 0% cpu 1.192 total
sudo cargo run --release -- > tandem_reads.bam 1.04s user 0.05s system 91% cpu 1.195 total
i first implemented with htslib, then tried switching to noodles with no architectural changes.
for noodles i'm opening like this:
let mut bam_reader = bam::reader::Builder.build_from_path(&opt.input_bam_file)?;
let header = bam_reader.read_header()?;
let mut bam_writer = bam::writer::Builder.build_from_path(&opt.bam_output_file)?;
for r in bam_reader.records(&header) {
...
}
for htlslib like this:
let mut bam = bam::Reader::from_path(&opt.input_bam_file).unwrap();
let header = bam::Header::from_template(bam.header());
let mut bam_out = bam::Writer::from_path(&opt.bam_output_file, &header, bam::Format::Bam).unwrap();
for r in bam.records() {
...
}
The internal filtering logic is the same , reasoning about flags insert size , etc - writing out some records.
Any suggestions?
this is on a macbook arm64 computer.
noodles tends to do more record validation and tries to ensure data is spec compliant. However, in your flamegraphs, it seems the DEFLATE implementation htslib linked to is using SIMD intrinsics (see inflate_fast_neon
and longest_match_neon
) on your CPU. By default, noodles uses the pure Rust implementation miniz_oxide, which can't compete in this case.
Can you retime your tests using libdeflate for both? I.e., cargo add noodles-bgzf --features libdeflate
for noodles and cargo add rust-htslib --features libdeflate
for rust-htslib. libdeflate is much faster for this application, regardless, and it would be better than comparing differing DEFLATE implementations.
with htslib libdeflate:
cat 0.00s user 0.01s system 1% cpu 0.574 total
sudo cargo run --release -- > tandem_reads.bam 0.45s user 0.03s system 83% cpu 0.577 total
with noodles
cat 0.00s user 0.01s system 1% cpu 0.740 total
sudo cargo run --release -- > tandem_reads.bam 0.61s user 0.05s system 89% cpu 0.743 total
much closer... thanks for the suggestion! Anything else I should try?
If you don't need owned records, you can reuse the record buffer.
let mut record = sam::alignment::Record::default();
while bam_reader.read_record(&header, &mut record)? != 0 {
// ...
}
When doing this, you may also want to use Reader::rc_records
in your rust-htslib test for a more fair comparison.
noodles 0.61.0 now allows alignment format records to read and written, but I'm not sure if this will help in a passthrough case like in your tests. Feel free to open a new discussion if you have any further questions.