modkit icon indicating copy to clipboard operation
modkit copied to clipboard

bedmethyl to bigWig: thread 'main' panicked at

Open Chuang1118 opened this issue 4 months ago • 8 comments

Hello,

modkit version: v0.4.4

I'm trying to do a Bigwig conversion, but I got an error message.

thread 'main' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bigtools-0.5.5/src/bbi/bbiwrite.rs:819:14: Expected to always send.: TrySendError { kind: Disconnected } note: run with RUST_BACKTRACE=1 environment variable to display a backtrace thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bigtools-0.5.5/src/bbi/bbiwrite.rs:1321:88: called Result::unwrap() on an Err value: JoinError::Cancelled(Id(68711)) thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bigtools-0.5.5/src/bbi/bbiwrite.rs:1321:88: called Result::unwrap() on an Err value: JoinError::Cancelled(Id(68713))

CMD

modkit bedmethyl tobigwig --negative-strand-values output/06_modification_analysis_output/00_converted_bigwig/mock_2_modified_base_count_sort.bed --mod-codes a -g ref/homo_sapiens.GRCh38.109.cdna.fasta.fai output/06_modification_analysis_output/00_converted_bigwig/mock_2_modified_base_count_sort.bigWig

More details, the process is not initially broken, I obtained the intermediate output file of 19M, then the file gone ! 2.9G Aug 8 14:11 mock_1_modified_base_count_sort.bed 19M Aug 8 15:23 mock_1_modified_base_count_sort.bigWig

Thank you for your help. Chuang

Chuang1118 avatar Aug 08 '25 14:08 Chuang1118

Hello @Chuang1118,

Sorry about the crash. Could you run the commend with --log $log_filename and attach it here?

ArtRand avatar Aug 11 '25 18:08 ArtRand

Hi ArtRand,

Please find attached.

mock_1_bigWig.log

Chuang1118 avatar Aug 12 '25 14:08 Chuang1118

I have other pipeline using modkit bedmethyl tobigwig to get genomic coordinates bigwig instead of transcriptome bigwig 1/ conversion transcript coordinates to genomics 2/ sorting 3/ to bigwig

I got the error message:

thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bigtools-0.5.5/src/bbi/bbiwrite.rs:1334:14:
Couldn't send section.: "SendError(..)"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> Error! duplicated record at 1 944219  944220  a       15      -       944219  944220  255,0,0 15      26.67   4       11      0       0       6       0       0

There are many transcript coordinates have same genomic coordinates e.g.

ENST00000327044 2741 2742  a 116  + 2741 2742 255,0,0 116 29.31  34  82   0   1  82   0   0  
ENST00000477976 4187 4188  a  15  + 4187 4188 255,0,0  15 26.67   4  11   0   0   6   0   0  

both transcripts map to chr1 944219 -

log file/ CMD

[src/logging.rs::60][2025-08-12 17:09:46][DEBUG] command line: modkit bedmethyl tobigwig --negative-strand-values output/08_methylbed2bigwig_analysis_output/03_reformated_modified_sites/mock_1_modified_base_count_filtered_genomic_reformated_sorted.bed --mod-codes a -g ref/homo_sapiens.GRCh38.109.fasta.fai output/08_methylbed2bigwig_analysis_output/04_converted_bigwig/mock_1_modified_base_count_filtered_genomic_reformated_sorted.bigWig --log-filepath output/08_methylbed2bigwig_analysis_output/04_converted_bigwig/mock_1_bigWig.log            

Thank you for your time.

Chuang1118 avatar Aug 12 '25 15:08 Chuang1118

Hello @Chuang1118,

I think the problems you're seeing here are due to the duplicated bedMethyl records. Modkit shouldn't crash the way it is now, it should handle this case more gracefully. Do you think it would be possible to send me a small bedMethyl that reproduces the crash?

To the larger problem at hand, you're trying to do something valid that's a little tricky (and correct me if I'm misunderstanding): map your reads to the transcriptome, then visualize the pileup on the genome. You could map the reads directly to the genome, but I'm assuming you'd rather have the transcript assignments from the transcriptome-mapping (correct?). When you take the transcriptome pileup and lift-over the records to genome coordinates, you get multiple records for a given genomic position, which isn't really a valid bedMethyl. A couple ideas come to mind: (1) you could sum the overlapping records together, (2) you could lift-over the reads in the BAM and assign a tag to the separate transcripts that overlap in genome-coordinates, then do a pileup for each tag (or use --partition-tag).

I don't know of a tool that does either of these things very easily. Maybe Modkit should be that tool? It doesn't have that functionality right now. Could you describe your use case/pipeline and maybe I can come up with some features to add?

Thanks.

ArtRand avatar Aug 13 '25 01:08 ArtRand

Hello ArtRand,

For reminder, I used modkit version: v0.4.4

Yes, I am agree with you. From a biological perspective, 1 gene can produce multiple different transcripts, hence the duplicated bedMethyl records.

In my case, see attached mock_1_modified_base_count_filtered_genomic_reformatted_sorted.txt

cat mock_1_modified_base_count_filtered_genomic_reformatted_sorted.bed | grep 944219

1	944219	944220	a	116	-	2741	2742	255,0,0	116	29.31	34	82	0	1	82	0	0
1	944219	944220	a	15	-	4187	4188	255,0,0	15	26.67	4	11	0	0	6	0	0

Modkit shouldn't crash the way it is now, it should handle this case more gracefully.

Current version modkit (v0.5.0) can handle this issue ?

My goal is to visualize the transcriptome pileup on the genome. because there is no transcription browser available like igv and my input is naive RNA.

You could map the reads directly to the genome, but I'm assuming you'd rather have the transcript assignments from the transcriptome-mapping

Do you have expertise in the field of dRNA mapping on the reference genome with model DNA, or vice versa ?

(2) you could lift-over the reads in the BAM and assign a tag to the separate transcripts that overlap in genome-coordinates, then do a pileup for each tag (or use --partition-tag).

There is one line of my bam. I'm not sure I understood --partition-tag

6e47ac3a-af4a-4cd5-b59e-b75f62b75b99	0	ENST00000488147	38	1	37S61M1D86M13I69M1D467M1I88M1I30M1D20M4I18M2D62M2I12M1D85M3D23M170S	*	0	0	AGCTGGCGGCGGGAGCTTCCGGGAGGGCGGCTCGCAGGCACCATGACTCCTGTGAGGATGCAGCACTCCCTGGCAGGTCAGACCTATGCCGTGCCCTTACCCAGCCAGACCTGCGGCGAGAGGAGGCCGTCCAGCAGATGGCGGATGCCCTGCAGTACCTGCAGAAGGTCTCTGGAGACATCTTCAGCAGGATCTCCCAGCAGGTAGAGCAGAGCCGGAGCCAGGTGCAGGCCATTGGAGAGAAGGTCTCCTTGGCCCAGGCCAAGTTGAGAAGATCAAGGGCAGCAAGAAGGCCATCAAGGTGTTCTCCAGTGCCAAGTACCCTGCTCCAGAGCGCCTGCAGGAATATGGCTCCATCTTCACCGGCGCCCAGGACCCTGGCCTGCAGAGACGCCCCCGCCACAGGATCCAGAGCAAGCACCGCCCCCTGGACGAGCGGGCCCTGCAGGAGAAGCTGAAGGACTTTCCTGTGTGCGTGAGCACCAAGCCGGAGCCCGAGGACGATGCAGAAGAGGGACTTGGGGGTCTTCCCAGCAACATCAGCTCTGTCAGCTCCTTGCTGCTCTTCAACACCACCGAGAACCTGTACAAGAAGTATGTCTTCCTGGACCCCCTGGCTGGTGCTGTAACAAAGACCCATGTGATGCTGGGGGCAGAGACAGAGGAGAAGCTGTTTGATGCCCCCTTGTCCATCAGCAAGAGAGAGCAGCTGGAACAGCAGGTCCCAGAGAACTTACTTCTATGTGCCAGACCTGGGCCAGGTGCCTGAGATTGATGTTCCATCCTACCTGCCTGACCTGTCCGGCATTGCCAATGACCTCACTGTACATTGCCGACCTGGGCCCCGGCATTGCCCCTCTGCCCCTGGCACCATTCCTTCCAGAACTGCCCACCTCACACTGAGGTAGCCGAGCCTCTCAAGACCTACAAGATGGGGTACTAACACCACCCCCACCGCCCCCCCACCACCACCCCAGCTCCTGAGGTGCTGGCCAGTGCACCCCCACTCCCACCCTCAACCGCGGCCCCTGTAGGCCAAGGCGCCAGGCAGGACGACAGCAGCAGCGCGTCTCCTTCAGGTGGGAGCAGCTCTTTGAGGCCACCTGATTTCTGGCGTGCTCAGTGCACTTCGGTGGATTTCTGTGGGTTTGTTAAGTGGTCAGAAATTCTCAATTTTTTTGAATAGTTTCGATTTCAAATATCTTGTTCTACTTGGTCCATAAAATAGTGGTTTTCAAAAAAAAAAAAT	&&&(4:;999:>>=<<<:;;;;=@8***+5568=>AAAB>==<>><<<;<<>@A@>?==???@@>>@>>>>?@AAA@?=<<;:9:;7688A;2,($$(*-;?>;BBCC@=<:;;;=>=BBA@@A?>===;:<=?>??>;77768BB@@CBBB@;:70;<7777DA@?<<<=@AAAA@??>?AABCCA@@?><;=>=?DDEBAABCCDCA@??=<<<?AAAAA???@BCDFECCB@@@@@AA@C9''())/..5668>ABB>858<6(''(67@=;;<?A?@BBBABCDBB@@>???BBA???AB@?@@A@?>?@@A@???>@@?><<<>?@?>==>?@@?@AB?=;;;;<:808635:;@B=><:::<@@D???>BBA??>=>>>??><:;;?BA<;;<===>=>>>>?====<:<<=<<=>>ABA??@A@A==?>?@BBBAA>?>>@ADDDCCDED@7E@?>@?>@AB>@<;;;==>=???AAA@?>??BA??>?@A??@>@?>?DBAA<;=<;93/678@>>?AA>>=>BA??>>?>>???>?=<==AAABBBABBCCB50(((-/23333A:99:CCB@ABBABBBBBBA@>@>=8=?A??A??>9888=?>>@?>>>?@@?>>>@>@>>:::9;;=@AB@@667<A==?A@??>>@@BCDFABBBCA><;<=?>>?BBBB=?AAA@@@@BBCCECDAAAABBCABBBCCAACCB?>>=???AABE00(''*'&(%'7>ACB996)%%;?@==?AAAABABCDCCDCCDBC..,,56;;?>@?@A@?ACDAAAA@?:777:@BBA?620-+,,/0'&&&/))))(&&'')0712062;985550/334557:8:;4.3679?A@?<80/.//*(+8;;;:;<==?@@=2)&&;9<BCDC@<<=9999?@=:=<;=><<861/++++>=:<=<;;;<>?BB?656.0225555/348=@=879:;=<07@ABCCDDBC<:::?@ACCCCCCCBC@???AA@?>>=<5566732..///568:ACC@><==ADDFFG@====@A@A@@@<999;?AACDB>>96669<===/--7;>>=>>>?@??0///879BBA2.,)'-993-'&&'-9===<47:<>?::7-,.-';@ABADE?;;??>>>AAAA@@>>?????@;D?AD@B99==@989?C>==>?>>>@E8876579>?B@?43;/9=4444:::=<;<==@EHIFA=;3364//.'$&*./////,&-('(	qs:f:17.413	du:f:12.867	ns:i:51468	ts:i:2600	mx:i:1	ch:i:768	st:Z:2025-03-06T18:48:54.254+00:00	rn:i:624	fn:Z:PBC13265_e32414dd_60f9ce1c_3.pod5	sm:f:830.16	sd:f:118.119	sv:Z:pa	dx:i:0	RG:Z:60f9ce1cf4df970330efd0925891a30e3d74bf7e_rna004_130bps_hac@v5.0.0	MN:i:1249	MM:Z:A+a.,0,2,1,5,0,2,0,0,1,2,3,1,0,4,0,1,0,2,2,1,0,0,2,0,0,0,0,0,1,2,0,2,0,1,0,0,1,1,2,0,1,1,5,4,1,2,0,0,0,0,1,3,1,1,2,3,0,0,0,1,0,2,0,0,2,1,1,0,2,6,0,1,0,0,0,0,1,1,0,0,0,0,0,3,0,0,1,0,2,0,2,0,0,0,1,1,1,0,0,1,0,0,6,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,4,4,1,2,1,3,2,0,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;	ML:B:C,106,180,151,124,22,100,112,169,114,20,58,27,169,29,62,22,19,109,24,108,129,13,30,213,171,38,250,214,49,16,27,16,21,71,66,88,155,120,17,28,247,243,247,56,252,32,18,72,255,24,107,104,251,18,172,36,108,54,88,16,57,93,182,178,46,21,74,34,172,121,13,12,12,137,158,35,18,241,213,133,85,52,129,35,186,69,41,59,26,31,62,27,39,22,13,148,19,221,18,61,119,182,26,160,223,69,62,37,159,154,122,81,59,20,78,156,136,101,53,39,191,189,100,255,160,235,171,12,18,135,79,24,166,16,13,47,124,106,124,143,111,103,148,153,163,169,169,165,160,161	NM:i:36	ms:i:1272	AS:i:1902	nn:i:0	de:f:0.0164729	tp:A:P	cm:i:63	s1:i:770	s2:i:793	MD:Z:61^C1T153^A66G30G224G210C13C37^C38^TC74^C85^CAG23	rl:i:0

The bam file is so large, tagging bam file isn't easy staff.

Firstly, the point of view of single molecule profiling, I would like to product your plot at https://github.com/nanoporetech/modkit/issues/367#issuecomment-2641095440 in my case. Secondly, binning single molecule profiling with some value to get the peaks more continuous (region bed), then dmr. My goal is the visual gene level. In other word, ignore heterogeneous profile.

Thanks

Chuang1118 avatar Aug 13 '25 14:08 Chuang1118

Hello @Chuang1118,

Current version modkit (v0.5.0) can handle this issue ?

It cannot. If there are two records reporting on the same genomic position and strand for a modification code, which one should it use for the BigWig record? What you could do for transcripts that overlap in genomic coordinates, is make multiple bedMethyl files for each original transcript and covert each into BigWig (you can skip the intermediate file and pipe into modkit bedmethyl tobigwig). I don't think this is an ideal workflow, so I'm working on better solutions.

Do you have expertise in the field of dRNA mapping on the reference genome with model DNA, or vice versa ?

You can map your dRNA reads to the genome using dorado aligner, documentation can be found here, specifically -x splice:hq is what we usually recommend.

ArtRand avatar Aug 14 '25 14:08 ArtRand

Hello @Chuang1118,

In case you run into errors such as

> Error! Input bedGraph not sorted by chromosome. Sort with sort -k1,1 -k2,2n.

I've pushed some updates to master and attached a build here, if you want to try it. It will still fail on duplicated records since I don't think it is completely valid to have them.

Thanks.

modkit_dev442e6ba_u16_x86_64.tar.gz

ArtRand avatar Aug 15 '25 01:08 ArtRand

Hello @ArtRand,

Thank you for your time and support.

Error! Input bedGraph not sorted by chromosome. Sort with sort -k1,1 -k2,2n.

I have sorted bed file mock_1_modified_base_count_filtered_genomic_reformatted_sorted.txt

This error does not make sense to me. One possible, when you handle duplicated records, you store them in "dictionary like structure" e.g. hashmap, as result it destroy the initial order.

For the release modkit_dev442e6ba_u16_x86_64.tar.gz I can't build it.

/opt/dist_modkit_v0.5.1_442e6ba` does not contain a Cargo.toml file. --path must point to a directory containing a Cargo.toml file.

As you previously mentioned

(1) you could sum the overlapping records together.

For a given modification site with the same genomic location, I re-defined the methylation percentage using the average value. Input: Image ouput: Image

then run modkit bedmethyl tobigwig

IGV

Image

chr1 1014209 %m : (11.71 + 0 + 10.83) / 3 = 7.513

Could you tell me, what's it mean ? Value: 11.328976 in bigWig format. It's possible to plot average %m ?

Thanks.

Chuang1118 avatar Aug 18 '25 15:08 Chuang1118