modkit
modkit copied to clipboard
find-motifs --contig failed to parse any bedmethyl records
trafficstars
Hey @ArtRand I'm having some issue running finds-motifs after merge bed. I get:
_My script: Finding enriched motifs for contig contig_100009 in metagenome_assembly_
> loading references from "mags//metagenome_assembly.fna"
> loaded 106135 sequence(s)
> using tabix/bgzf reader
> Error! failed to parse any bedmethyl records
When I look at the merged BED that is input to find-motifs I can do grep "contig_100009" all_samples.bed I get many lines of output such as:
contig_100009 5411 5412 a 1 + 5411 5412 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5413 5414 a 1 + 5413 5414 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5414 5415 a 1 + 5414 5415 255,0,0 1 100.00 1 0 0 0 0 0 0
contig_100009 5416 5417 a 1 + 5416 5417 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5417 5418 a 1 + 5417 5418 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5418 5419 a 1 + 5418 5419 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5419 5420 a 1 + 5419 5420 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5420 5421 m 1 + 5420 5421 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5420 5421 21839 1 + 5420 5421 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5421 5422 m 1 + 5421 5422 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5421 5422 21839 1 + 5421 5422 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5422 5423 a 1 + 5422 5423 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5423 5424 a 1 + 5423 5424 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5424 5425 m 1 + 5424 5425 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5424 5425 21839 1 + 5424 5425 255,0,0 1 0.00 0 1 0 0 0 0 contig_100009 5426 5427 a 1 + 5426 5427 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_100009 5427 5428 a 1 + 5427 5428 255,0,0 1 0.00 0 1 0 0 0 0 0
The script is doing:
modkit bedmethyl merge -o "$merged_bed" -g "$outdir/sizes.tsv" "$outdir"/*.bed.gz --threads "$num_threads" --io-threads 10 --log-filepath $merged_bed.log
bgzip -k --threads "$num_threads" "$merged_bed"
tabix -p bed "$merged_bed.gz"
then
modkit find-motifs \
--min-coverage 5 \
-i "$merged_bed.gz" \
-r "$genome" \
--contig "$contig" \
-o "$contig_motif_tsv" \
--log-filepath $contig_motif_tsv.log
--threads 20 || echo "Find-motifs failed for contig $contig."
In sizes.tsv if I do grep "contig_100009 sizes.tsv I get: contig_100009 5531. Finally, just as a sanity check I did find >contig_100009 in the metagenome_assembly.fna file.
Here's the log from merge bed, I removed a lot of similar contig not present statements and made sure my contig wasn't in there:
[src/logging.rs::60][2025-01-16 14:26:28][DEBUG] command line: modkit bedmethyl merge -o methylation_data//metagenome_assembly/all_samples.bed -g methylation_data//metagenome_assembly/sizes.tsv methylation_data//metagenome_assembly/barcode01.bed.gz methylation_data//metagenome_assembly/barcode02.bed.gz methylation_data//metagenome_assembly/barcode03.bed.gz methylation_data//metagenome_assembly/barcode05.bed.gz methylation_data//metagenome_assembly/barcode06.bed.gz methylation_data//metagenome_assembly/barcode07.bed.gz methylation_data//metagenome_assembly/barcode08.bed.gz methylation_data//metagenome_assembly/barcode09.bed.gz methylation_data//metagenome_assembly/barcode10.bed.gz methylation_data//metagenome_assembly/barcode11.bed.gz methylation_data//metagenome_assembly/barcode12.bed.gz methylation_data//metagenome_assembly/barcode13.bed.gz methylation_data//metagenome_assembly/barcode14.bed.gz --threads 20 --io-threads 10 --log-filepath methylation_data//metagenome_assembly/all_samples.bed.log
[src/bedmethyl_util/subcommand.rs::239][2025-01-16 14:26:30][INFO] Collected 104643 contigs from 13 readers
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10005 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10006 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100064 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_1001 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100162 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10040 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10058 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100660 is not present in tabix headers and will be skipped
.
.
.
.
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9936 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9946 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99490 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9957 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99635 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99794 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99911 is not present in tabix headers and will be skipped
[src/command_utils.rs::295][2025-01-16 14:26:30][INFO] calculated chunk size: 30, interval size 100000, processing 3000000 positions concurrently
[src/interval_chunks.rs::512][2025-01-16 14:26:30][DEBUG] there are 104643 contig(s) to work on (104643 parts)
[src/interval_chunks.rs::547][2025-01-16 20:15:55][DEBUG] no more records to process
The log from find-motifs:
[src/logging.rs::60][2025-01-16 21:26:33][DEBUG] command line: modkit find-motifs --min-coverage 5 -i methylation_data//metagenome_assembly/all_samples.bed.g$
[src/find_motifs/mod.rs::971][2025-01-16 21:26:33][INFO] loading references from "mags//metagenome_assembly.fna"
[src/find_motifs/mod.rs::1002][2025-01-16 21:26:40][INFO] loaded 106135 sequence(s)
[src/find_motifs/mod.rs::1132][2025-01-16 21:26:40][INFO] using tabix/bgzf reader