htslib icon indicating copy to clipboard operation
htslib copied to clipboard

Add a flexible array member to the bam1_t struct to facilitate a BAM …

Open nathanweeks opened this issue 8 years ago • 1 comments

Add a flexible array member to the bam1_t struct to allow the variable-length data to be colocated in memory with the rest of the bam1_t. This should be backwards compatible, as the addition of the flexible array member shouldn't alter sizeof(bam1_t) in this case.

This is used by the samtools sort optimization described in https://github.com/samtools/samtools/pull/593

nathanweeks avatar Jun 21 '16 02:06 nathanweeks

I've done this sort of thing before, it's how "io_lib" works infact. You don't really need a separate variable as b.data = ((char *)&b) + sizeof(b); would work too, although I like the empty array idea better. The bigger problem is the all pervasive alloc/dealloc code of bam1_t structs that assumes it can realloc or free b.data. That's not just htslib, but samtools etc too as there's never really been an API to access these fields. As such it can only be used here and there when we want better optimisation, rather than a generic fix. :(

I fully support making use of reducing malloc overheads. I think we discussed sort in the office once and concluded that it could also be fixed by having one large single malloc block of memory that the data pointers are set to and using a pool-allocator (eg see htslib/cram/pooled_alloc.[ch]) for obtaining the bam1_t structs, turning it into just a handful of mallocs. I don't think anyone coded it up yet though so credit for tackling this one.

jkbonfield avatar Jun 21 '16 06:06 jkbonfield