Add BBTools Java implementation for fqcnt benchmark
This PR adds BBTools FastqScan as a Java implementation for the fqcnt benchmark.
Implementation Details
- Uses BBTools' FastqScan tool with multithreaded SIMD-accelerated parsing
- Wrapper script:
fqcnt_java_bbtools.sh - Output format matches biofast specification:
<records>\t<bases>\t<qualities>
Testing
Tested with M_abscessus_HiSeq.fq (5,682,010 reads):
5682010 568201000 568201000
Requirements
- Java 18+ required (for jdk.incubator.vector SIMD support)
- Java 25 recommended for optimal performance
- BBTools:
git clone --depth=1 https://github.com/bbushnell/BBTools
About BBTools
BBTools is a comprehensive suite of bioinformatics tools developed at the Joint Genome Institute (JGI). FastqScan provides high-performance FASTQ parsing optimized for modern hardware.
Repository: https://github.com/bbushnell/BBTools
Performance Note: FastqScan is fastest with larger files and BGZF compression
JVM Startup Overhead
Java has ~0.25s startup/JIT compilation overhead that dominates benchmarks on small files (like the 5.6M read test case). This overhead is:
- Amortized on production-scale files (100M+ reads)
- Irrelevant when called from Java code (JVM already running)
BGZF Multithreaded Decompression
FastqScan is actually faster on BGZF-compressed files than plaintext due to parallel decompression, if there are sufficient cores (~20). On 80M reads:
- Plaintext: ~4.2 GB/s (single-threaded)
- BGZF compressed: ~6.2 GB/s (multithreaded decompression)
- FastqScanMT (with t=2): ~9.5 GB/s on BGZF

Performance comparison showing FastqScanMT at 13.5x faster than Rust needletail on BGZF files