htslib icon indicating copy to clipboard operation
htslib copied to clipboard

cram_fd shrinkage and ditching of cram v1.0.

Open jkbonfield opened this issue 7 years ago • 0 comments

To-do when dropping CRAM v1.x support. Realistically it should have been ditched ages ago as I don't believe C and Java implementations were particularly interchangeable due to spec problems. I doubt any 1.x files exist in the wild.

Cram_fd has 32k of pointless data, but changing it now I suspect is an API breakage (I'd need to check the scope of that struct; size changing may be a problem). In cram_structs.h we have:

    // lookup tables, stored here so we can be trivially multi-threaded                                         
    unsigned int bam_flag_swap[0x1000]; // cram -> bam flags                                                    
    unsigned int cram_flag_swap[0x1000];// bam -> cram flags                                                    

These are intialised in cram_io.c. For CRAM v1.0 they were translation tables of BAM flags to CRAM flags and vice versa because the CRAM flags were identical but in a different bit order. This was one of things we sanitised when creating CRAM v2.0 so it is now essentially flag[i]=i and a nop. If we ditched support for v1.0 we could simplify a lot of this code base, including this chunk.

Why does it matter? If we're doing a merge and sharing headers between cram streams then up to 15% of memory usage may be that lookup table!

jkbonfield avatar Apr 20 '17 14:04 jkbonfield