htsjdk icon indicating copy to clipboard operation
htsjdk copied to clipboard

Possible core dumps due to Intel's gkl

Open nh13 opened this issue 4 years ago • 10 comments

I was wondering if other folks (@tfenne @lbergelson @yfarjoun) are seeing frequent core dumps using the latest Picard/fgbio, which rely upon htsjdk. I have exactly five users over here compiling both from source and getting frequent (1/10) core dumps.

Stack: [0x000070000da04000,0x000070000db04000],  sp=0x000070000db029f0,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libgkl_compression1961443782838211236.dylib+0x6ea7]  deflate_medium+0x867
C  [libgkl_compression1961443782838211236.dylib+0x508b]  deflate+0xf1b
C  [libgkl_compression1961443782838211236.dylib+0x1bac]  Java_com_intel_gkl_compression_IntelDeflater_deflateNative+0x1bc
j  com.intel.gkl.compression.IntelDeflater.deflateNative([BI)I+0
j  com.intel.gkl.compression.IntelDeflater.deflate([BII)I+3
j  htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock()I+55
j  htsjdk.samtools.util.BlockCompressedOutputStream.write([BII)V+113
j  htsjdk.samtools.util.BinaryCodec.writeBytes([BII)V+24
j  htsjdk.samtools.util.BinaryCodec.writeByteBuffer(I)V+35
J 3957 C1 htsjdk.samtools.BinaryTagCodec.writeArray(Ljava/lang/Object;Z)V (362 bytes) @ 0x0000000109ea2e04 [0x0000000109ea1a60+0x13a4]
J 3965 C1 htsjdk.samtools.BinaryTagCodec.writeTag(SLjava/lang/Object;Z)V (311 bytes) @ 0x0000000109eab9e4 [0x0000000109ea9880+0x2164]

nh13 avatar Oct 02 '19 18:10 nh13

@nh13 I've heard rumor that there is a bug in the Intel deflater. From what I've seen, it only affects users with macOS Mojave (but not all of them).

fleharty avatar Oct 02 '19 18:10 fleharty

@fleharty we are all OSX users (10.14.xx)

nh13 avatar Oct 02 '19 18:10 nh13

@nh13 You're probably hitting the same issue as this, https://github.com/broadinstitute/picard/issues/1383. It's a real pain. I have been getting pretty consistent segfaults on osx 10.14.xx when using intel deflater on some files. It seems to be specific to certain inputs but I don't understand what the error condition is.

Try disabling the intel deflater. It's unfortunate since disabling it will slow everything down. Intel is aware of the issue but they don't currently have any engineers who are able to work on the problem. I've been told that they have 2 people who are getting up to speed but I don't know what timeline we're looking at. See https://github.com/Intel-HLS/GKL/issues/101.

I don't think I'm going to be able to fix the deflater bug myself without a significant time investment that I'm not currently able to make, and I don't think anyone else on our team is going to be able to do it anytime soon. If you our @tfenne are interested in taking a crack at it I'm sure intel would be willing to accept a PR.

lbergelson avatar Oct 02 '19 19:10 lbergelson

@lbergelson we may think of adding support for other deflaters, as they seem just as good: http://jkbonfield.github.io/www.htslib.org/benchmarks/zlib.html

nh13 avatar Oct 07 '19 23:10 nh13

Interesting.

@jkbonfield Is the intel optimized zlib that you tested the same as the one here https://github.com/Intel-HLS/GKL/tree/master/src/main/native/compression ?

lbergelson avatar Oct 10 '19 20:10 lbergelson

No, it was from another Intel developer: https://github.com/jtkukunas/zlib

I wasn't aware of this other one. I don't know how they differ.

jkbonfield avatar Oct 11 '19 15:10 jkbonfield

The one I linked is the one that GATK and Picard use. I'm curious how it compares.

lbergelson avatar Oct 11 '19 18:10 lbergelson

I did some testing and it came out at 1m17s decode (1 thread) and 2m13s encode (elapsed time with 4 threads; 8m29s CPU). File size was 6,580,893,700 bytes.

This corresponds almost perfectly with the "Intel" line in my chart which had 1m15s decode and 2m11s encode, with identical file size. so my guess is the GKL incorporates the same jkukunas zlib code.

I would recommend therefore you try integrating the libdeflate version instead and giving it a whirl to see how it performs inside Java. Note it's not a compatible zlib API and it doesn't have the same streaming nature, so it'll require quite a bit of interface wrapping. If you're doing that it's also worth looking at slz (http://www.libslz.org/) for super-fast deflate at level 1. It's not that good at ratio, but the design of it is to be as fast as possible at compression (much like igzip-1 I guess). I haven't benchmarked it myself on BAM though.

jkbonfield avatar Oct 14 '19 15:10 jkbonfield

FYI a quick hacky test with slz "enc" program, swallowing the same uncompressed BAM (around 16GB) at level 1 took 97s (NB: 1 thread) and compressed it to ~9GB. Not good compression and my test used shed-loads of memory as it slurped the entire file into memory, but it gives an indication of level 1 compression performance. That's around twice as fast as libdeflate at level 1, albeit around 30% larger.

So maybe worth investigating for temporary files, but tbh we could also use e.g. zstd or lz4 for temporary files too as they're never going to be ingested by anything other than our own code.

jkbonfield avatar Oct 14 '19 16:10 jkbonfield

I believe the core dumps are fixed in the 0.8.8 release of GKL.

lbergelson avatar Nov 02 '21 20:11 lbergelson