modkit Open chromatin predict

Thanks for this great tool. I am trying to use open chromatin predict on our data. I am running on a GPU using the candle version.

I first tried running it on the bam (40G) directly, but it got stuck and didn't write anything - I waited more than 48 hours. I also tried specifying just a chromosome, same issue; and even a small bed, same issue. I then downsampled to a smaller bam (2.7G) and was able to run on a chromosome-by-chromosome basis, though some chromosomes also got stuck. This does not seem to depend on the size of the chromosome.

I am trying to see what reads cause it to hang, but it is not proceeding in sequential order through the bam, at least in terms of what is written out to the bedGraph.

We are also interested in unmapped reads, but the tool finished and returned no reads for this even though there are 6mA calls there - is there a flag I need to send in to be sure it includes unmapped reads?

Aug 06 '25 08:08 nchernia

Hello @nchernia,

Sorry that the command isn't working.

Regarding the program getting stuck. Could you run it with --log <log-filename> and attach it here (or send it via email)? Maybe from that I can tell where it's getting stuck. Does the program consume RAM or GPU resources?

I have one other example internally sounds like it causes a similar problem. Do you think it's possible to subset the BAM to a single small-ish region that reproduces the problem? If so, could you send it to me? Feel free to email me: art.rand [at] nanoporetech.com and we can arrange how to transfer the BAM.

Regarding the unmapped reads, the open chromatin algorithm requires that the reads are mapped to the reference to determine if that region is accessible to the MTase or not. I'm not sure how you would use unmapped reads in this case. Maybe if you elaborate on what you're trying to learn from these reads I can help you more.

Thanks.

Aug 07 '25 17:08 ArtRand

Thanks for your response!

I'm now trying on just chr1 with a file subsetted using samtools (it's 3.2GB). It also appears stuck:

using device Cuda(CudaDevice { device: CudaDevice(DeviceId(1)), index: 0 }) loaded model config { "num_features": 12, "num_classes": 2, "hidden_size": 256, "chunk_size": 100, "modified_bases": { "A": [ "a" ] } } loading weights from "/home/neva/dist_modkit_v0.5.0_5120ef7/models/[email protected]/model.mpk" collecting regions of 25675bp (100 bp chunks), super batches of 100 (2567500bp). Stepping 25 bp at a time. 0 records written

The log is also not being written to anymore. Usually when it works, I see records being written. Attached is the log file.

Regarding unmapped, we are interested in tandem repeats, which are often different between individuals and do not map well to the reference.

SAR_040_chr1.log

Aug 13 '25 06:08 nchernia

Hello @nchernia

Would you be willing to share the BAM that causes the problem with me? I've been testing with 30-40x coverage on the whole genome and can't reproduce this problem. You can email me at art.rand[at]nanoporetech.com and I can set up a way to share if you don't have one already.

Aug 14 '25 14:08 ArtRand

@nchernia

I may have found where the problem is. Could you try adding the following options to your command:

--super-batch-size 10 --batch-size 64

If that works, you can skip --batch-size 64 and try as well. I should have a fix/warning soon.

Aug 19 '25 01:08 ArtRand

Thank you - I ran it with these parameters and it seemed to work. I will try without the super-batch-size and report back. For this run, there's an unusual message at the end.

collecting regions of 1675bp (100 bp chunks), super batches of 10 (16750bp). Stepping 25 bp at a time. 9349538 records written > model received receiving on an empty and disconnected channel write handle got receiving on an empty and disconnected channel

Aug 26 '25 16:08 nchernia

Using just super-batch worked as well. The results are slightly different; there are 8 more lines written in the version with --batch-size 64, and the log file is much bigger (95M vs 6.2M). I got the same message at the end:

> model received receiving on an empty and disconnected channel
> write handle got receiving on an empty and disconnected channel

Original bedGraph head:

chr1    3400    3425    0.6411874
chr1    3425    3450    0.363223
chr1    3450    3475    0.26683313
chr1    3475    3500    0.22840261
chr1    3500    3525    0.095497906
chr1    3525    3550    0.26477274
chr1    3550    3575    0.49079007
chr1    3575    3600    0.708526
chr1    3600    3625    0.92767143

Version without batch-size head:

chr1    3400    3425    0.63424516
chr1    3425    3450    0.37120336
chr1    3450    3475    0.27269077
chr1    3475    3500    0.2238231
chr1    3500    3525    0.0817063
chr1    3525    3550    0.23102966
chr1    3550    3575    0.45643026
chr1    3575    3600    0.68326914
chr1    3600    3625    0.913419

Aug 26 '25 18:08 nchernia

Hello @nchernia,

Ok good, glad it seems to have worked.

there are 8 more lines written in the version with --batch-size 64

Could you expand on this? Are they at the end, beginning, or interspersed?

Could you give me an idea of what all the extra log lines are?

Aug 26 '25 23:08 ArtRand

Hi,

They are interspersed; attached are the first 100 lines of running diff on the first 3 fields (diffs.txt) I'm also attaching the two log files, with the bigger one cut to the first 100K lines

SAR_040_chr1.log

SAR_040_chr1_orig.log

diffs.txt

Aug 27 '25 17:08 nchernia

Hi @ArtRand

This is happening again on a different file, even with the --super-batch-size 10 --batch-size 64 flags. I will email you to see about sending the file for testing.

Thanks Neva

Nov 07 '25 17:11 nchernia

Sorry, one update. I tried to do it on a chr-by-chr basis to isolate the error and that didn't work, but then I eventually tried only the flag --super-batch-size 10 (without batch-size) and it seems to be working.

Nov 07 '25 17:11 nchernia