modkit icon indicating copy to clipboard operation
modkit copied to clipboard

find-motifs --contig failed to parse any bedmethyl records

Open Ge0rges opened this issue 10 months ago • 5 comments
trafficstars

Hey @ArtRand I'm having some issue running finds-motifs after merge bed. I get:

_My script: Finding enriched motifs for contig contig_100009 in metagenome_assembly_
> loading references from "mags//metagenome_assembly.fna"  
> loaded 106135 sequence(s)                                    
> using tabix/bgzf reader                                 
> Error! failed to parse any bedmethyl records

When I look at the merged BED that is input to find-motifs I can do grep "contig_100009" all_samples.bed I get many lines of output such as:

contig_100009   5411    5412    a       1       +       5411    5412   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5413    5414    a       1       +       5413    5414   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5414    5415    a       1       +       5414    5415   255,0,0  1       100.00  1       0       0       0       0       0      0
contig_100009   5416    5417    a       1       +       5416    5417   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5417    5418    a       1       +       5417    5418   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5418    5419    a       1       +       5418    5419   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5419    5420    a       1       +       5419    5420   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5420    5421    m       1       +       5420    5421   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5420    5421    21839   1       +       5420    5421   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5421    5422    m       1       +       5421    5422   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5421    5422    21839   1       +       5421    5422   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5422    5423    a       1       +       5422    5423   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5423    5424    a       1       +       5423    5424   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5424    5425    m       1       +       5424    5425   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5424    5425    21839   1       +       5424    5425   255,0,0  1       0.00    0       1       0       0       0       0      contig_100009   5426    5427    a       1       +       5426    5427   255,0,0  1       0.00    0       1       0       0       0       0      0
contig_100009   5427    5428    a       1       +       5427    5428   255,0,0  1       0.00    0       1       0       0       0       0      0

The script is doing:

modkit bedmethyl merge -o "$merged_bed" -g "$outdir/sizes.tsv" "$outdir"/*.bed.gz --threads "$num_threads" --io-threads 10 --log-filepath $merged_bed.log
bgzip -k --threads "$num_threads" "$merged_bed"
tabix -p bed "$merged_bed.gz"

then

  modkit find-motifs \
--min-coverage 5 \
-i "$merged_bed.gz" \
 -r "$genome" \
--contig "$contig" \
-o "$contig_motif_tsv" \
--log-filepath $contig_motif_tsv.log
--threads 20 || echo "Find-motifs failed for contig $contig."

In sizes.tsv if I do grep "contig_100009 sizes.tsv I get: contig_100009 5531. Finally, just as a sanity check I did find >contig_100009 in the metagenome_assembly.fna file.

Here's the log from merge bed, I removed a lot of similar contig not present statements and made sure my contig wasn't in there:

[src/logging.rs::60][2025-01-16 14:26:28][DEBUG] command line: modkit bedmethyl merge -o methylation_data//metagenome_assembly/all_samples.bed -g methylation_data//metagenome_assembly/sizes.tsv methylation_data//metagenome_assembly/barcode01.bed.gz methylation_data//metagenome_assembly/barcode02.bed.gz methylation_data//metagenome_assembly/barcode03.bed.gz methylation_data//metagenome_assembly/barcode05.bed.gz methylation_data//metagenome_assembly/barcode06.bed.gz methylation_data//metagenome_assembly/barcode07.bed.gz methylation_data//metagenome_assembly/barcode08.bed.gz methylation_data//metagenome_assembly/barcode09.bed.gz methylation_data//metagenome_assembly/barcode10.bed.gz methylation_data//metagenome_assembly/barcode11.bed.gz methylation_data//metagenome_assembly/barcode12.bed.gz methylation_data//metagenome_assembly/barcode13.bed.gz methylation_data//metagenome_assembly/barcode14.bed.gz --threads 20 --io-threads 10 --log-filepath methylation_data//metagenome_assembly/all_samples.bed.log
[src/bedmethyl_util/subcommand.rs::239][2025-01-16 14:26:30][INFO] Collected 104643 contigs from 13 readers
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10005 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10006 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100064 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_1001 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100162 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10040 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_10058 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_100660 is not present in tabix headers and will be skipped
.
.
.
.
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9936 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9946 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99490 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_9957 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99635 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99794 is not present in tabix headers and will be skipped
[src/bedmethyl_util/subcommand.rs::253][2025-01-16 14:26:30][DEBUG] contig_99911 is not present in tabix headers and will be skipped
[src/command_utils.rs::295][2025-01-16 14:26:30][INFO] calculated chunk size: 30, interval size 100000, processing 3000000 positions concurrently
[src/interval_chunks.rs::512][2025-01-16 14:26:30][DEBUG] there are 104643 contig(s) to work on (104643 parts)
[src/interval_chunks.rs::547][2025-01-16 20:15:55][DEBUG] no more records to process

The log from find-motifs:

[src/logging.rs::60][2025-01-16 21:26:33][DEBUG] command line: modkit find-motifs --min-coverage 5 -i methylation_data//metagenome_assembly/all_samples.bed.g$
[src/find_motifs/mod.rs::971][2025-01-16 21:26:33][INFO] loading references from "mags//metagenome_assembly.fna"
[src/find_motifs/mod.rs::1002][2025-01-16 21:26:40][INFO] loaded 106135 sequence(s)
[src/find_motifs/mod.rs::1132][2025-01-16 21:26:40][INFO] using tabix/bgzf reader

Ge0rges avatar Jan 17 '25 07:01 Ge0rges