failed to get modbase info for record ..., Skipped: AUX data not found
I am getting this error while running modkit pileup: modkit pileup <bam> <bed>
failed to get modbase info for record <record>, Skipped: AUX data not found
Procedure
- nanopolish index -d
- minimap2 -a -x map-ont
- samtools sort
- samtools index
I also ran modbam2bed. A file was formed by it gave methylation nan or 0
What could be the reason so?
Hello @ArnavBharti,
You must be loosing the MM/ML/MN tags somewhere along the way. Could you check that the records have these tags after each step? If you can tell me the file formats of the outputs and inputs to the steps that could help me debug the problem.
- Initially FAST5
- After basecalling, FASTQ
- Nanopolish index: (input) FAST5, FASTQ; (output) index files .fai, etc.
- minimap: (input) FASTA, FASTQ; (output) SAM
- samtools: (input) sam (output) bam
Hello @ArnavBharti,
Could you check that the reads have the MM/ML/MN tags at the end of each step? I have a feeling that your basecalled FASTQ does not have them. For your reference the tags I'm talking about are described in the SAM tags specification in section 1.7.
Hi, could you tell if you check for these tags via text search or is there a tool. Because there exists 'MM' in the FASTq file upon a simple text search but I feel like I am probably missing something.
Hi, I was also facing the same issue, is there a way to check the tags in the fastq file.
Hello @ArtRand
It would be nice if you could give insights on the same.
Hi, could you tell if you check for these tags via text search or is there a tool. Because there exists 'MM' in the FASTq file upon a simple text search but I feel like I am probably missing something.
The SAM/BAM tags as part of the FASTQ file format is not officially supported. The "support" for this is simply via the -y argument to minimap2 which "Copy input FASTA/Q comments to output." (see manual here). Thus the only way to check for the tags in a FASTQ would be text based. I would suggest like the following awk line to print out any rows of the file with the MM tag: awk 'NR %4 == 1 && $0 ~ /MM:Z:/'
If you must use minimap2 ensure that you have the -y option specified to ensure the tags are indeed copied to the output mappings. This is the most likely step where the tags are dropped as long as modified base calling was on at basecalling time (though I think this might require unmapped BAM output).
Hi,
- I used y flag with minimap2
minimap2 -aY -x map-ontand it still gave same error - the awk
awk 'NR %4 == 1 && $0 ~ /MM:Z:/'gave no output
As per 2, Is there a problem with the basecalled files?
I'll try witth dorado aligner
-Y and -y are different options in minimap. In this case you want the lower case one not the upper case one as shown in your command.
Even with -y same issue
Given that the awk line gives no results the issue is that you do not have modified base calls in your basecalls. Modified base calls are not available for FASTQ output from Dorado. If you output the default unmapped BAM the modified base calls will be there. You can also have Dorado perform the mapping while basecalling to avoid any of the minimap issues.
I'm not sure what your downstream goals are with running nanopolish etc, but you can convert your unmapped BAM file to FASTQ with the samtools fastq -T "*" command. Let me know if you have any further issues.
Hello @marcus1487
The basecalling was done using trained RNN model i.e. res_dna_r941_min_modbases-all-context_v001 by using guppy v3.5.1.
Hello @marcus1487 @ArtRand Art Using guppy v3.5.1, I attempted to do basecalling using the trained RNN model, res_dna_r941_min_modbases-all-context_v001.Following the completion of basecalling, which produced the fastq files, minimap was used to create the bam file. I then ran modkit pileup, however it failed with the error "failed to get modbase info for record \record>." omitted: AUX data not located
Note: Since the data came from the R9 flowcell, guppy was run. Therefore, in order to do the m6A modified basecalling, I utilised the following configuration file: res_dna_r941_min_modbases-all-context_v001
Your opinions on the same would be greatly appreciated.
Thanks
I don't believe modified base output in FASTQ format is supported. You'll have to specify BAM output format in order to proceed from guppy basecalling.
I don't believe modified base output in FASTQ format is supported. You'll have to specify BAM output format in order to proceed from guppy basecalling. Hello @marcus1487 In case of this there is no option to specify bam output, by default it will only give fastq. Note-But by normal text search i have observed MM tags in the fastq file.
Hello @marcus1487 @ArtRand
It would be nice if you could help me with the above issue.
Thanks
Hello @PRIYANKA-22091995 and @ArnavBharti,
You need to run the basecalling such that you get and unaligned BAM file as output (not FASTQ output). I believe that the --bam_out flag will do this. From there I would align the reads with dorado aligner 0.5.3. Could you tell me when you're able to successfully generate a basecall BAM file with the MM/ML tags? You can check the tags with modkit summary.
@ArnavBharti @PRIYANKA-22091995,
Any update on this?
@ArnavBharti @PRIYANKA-22091995,
Any update on this?
there is no option to put --bam out, and even after putting --bam out, it has only given fastq output. It would be nice to know any suggestion/input for the same.
Hello @ArtRand The following command was used for modified basecalling for R9 flowcell data: -i <FAST5> -s guppy_output_ed_cc_cq_m6a/ -c ~/workspace/rerio/basecall_models/res_dna_r941_min_modbases-all-context_v001.cfg --recursive --calib_reference ./PlasmoDB-61_Genome.fasta -x cuda:0 -m ~/workspace/rerio/basecall_models/res_dna_r941_min_modbases-all-context_v001.jsn --barcode_kits "EXP-NBD104"
@ArnavBharti @PRIYANKA-22091995,
Any chance you could update guppy to at least 6.3.2? I downloaded v3.5.1 and don't see the options you'll need. If you're using R9 data, you could probably use dorado also.
@ArnavBharti @PRIYANKA-22091995,
Any chance you could update guppy to at least
6.3.2? I downloaded v3.5.1 and don't see the options you'll need. If you're using R9 data, you could probably use dorado also.
Earlier was trying through guppy v0.6.2, but it failed with a error [guppy/error] The pipeline has shut down prematurely due to an error condition. So, then i moved to the previous version to do modified basecalling for m6A, but again it does not provide option with bam. to do modified basecalling with R9 data, dorado does not have option for m6A.