Parsing input error
Hi @ArtRand,
The following command which I believe to have executed on identical files in the past (perhaps on 0.3.0) seem to produce the error below now:
modkit dmr multi \
-s methylation_10/brevundimonas_r-contigs/barcode01.bed.gz top \
-s methylation_10/brevundimonas_r-contigs/barcode02.bed.gz middle \
-s methylation_10/brevundimonas_r-contigs/barcode03.bed.gz bottom \
-s methylation_10/brevundimonas_r-contigs/barcode05.bed.gz top \
-s methylation_10/brevundimonas_r-contigs/barcode06.bed.gz middle \
-s methylation_10/brevundimonas_r-contigs/barcode07.bed.gz bottom \
-s methylation_10/brevundimonas_r-contigs/barcode08.bed.gz top \
-s methylation_10/brevundimonas_r-contigs/barcode09.bed.gz middle \
-s methylation_10/brevundimonas_r-contigs/barcode10.bed.gz bottom \
-s methylation_10/brevundimonas_r-contigs/barcode11.bed.gz barcode11 \
-s methylation_10/brevundimonas_r-contigs/barcode12.bed.gz barcode12 \
-s methylation_10/brevundimonas_r-contigs/barcode13.bed.gz barcode13 \
-s methylation_10/brevundimonas_r-contigs/barcode14.bed.gz barcode14 \
-r methylation_10/brevundimonas_r-contigs/gene-coordinates.txt \
-o methylation_10/brevundimonas_r-contigs/dmr_by_gene/ \
-t 20 \
--ref mags/brevundimonas_r-contigs.fna \
--base C \
--base A \
--min-valid-coverage 10
Error: > Error! Parsing Error: Error { input: "\t\t", code: Many1 }
Is this due to a change/misformat in my input files that I might have missed or does it seem like a bug in modkit? The error is a buit mysterious.
@Ge0rges,
I agree, the parsing errors should be more informative. I'll fix that.
Could you tell me which version of modkit you used to generate the input data (the pileups)? Also could you attach or paste the gene-coordinates.txt file? (email is also fine).
I used 0.3.1, also the gene-coordinates file is the issue, just looked at it and it's not normal. Guess that was the issue! I'll fix it and confirm.
Seems like that fixed it @ArtRand next time I'll review my input files instead of trusting the script! Sneaky updates sneak pass me...
@Ge0rges I'm going to re-open this issue to track work for better error messages when input fails to parse. Some other users have encountered the same error and it's not clear enough what the problem is.
Hi @ArtRand,
I've also encountered a parsing error - I'm trying to run the script below, attempting to use the regions.bed.gz files as output from wf_human_variation --mod function. Have also tried with the wf_mods.bedmethyl.gz.
For the -r /regions-bed, I download the NCBI refseq track in bed format.
Define variables for paths
REF="/projects/health_sciences/oms/pathology/powry48p/202404ONT/reference/ref_genome/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna" OUT_DIR="/weka/powry48p/results/modkit_output/"
Run modkit dmr
./modkit dmr multi
-s barcode17.regions.bed.gz Tri102_1
-s barcode19.regions.bed.gz Tri102_2
-s barcode21.regions.bed.gz Tri103_1
-s barcode23.regions.bed.gz Tri103_2
-o $OUT_DIR
-r refseq.bed
--ref $REF
-m C
--log-filepath dmr_multi.log
Error:
error fetching line from regions BED, stream did not contain valid UTF-8 error fetching line from regions BED, stream did not contain valid UTF-8 Error! Parsing Error: Error { input: "= {", code: Digit }
Any tips would be appreciated, thanks!
Hello @Rpowellnz,
Could you tell what
$ head -n 5 refseq.bed
looks like?
Hi @ArtRand,
The output from $ head -n 5 refseq.bed is as below, which I'm guessing is not correctly formatted.. Could you provide some guidance on how to generate the appropriate .bed file for -r/ for a genome-wide differential methylation analysis of protein coding genes?
bplist00�_WebMainResource�
_ebResourceTextEncodingName_WebResourceData_WebResourceMIMEType_WebResourceFrameName^WebResourceURLUUTF-8O�S
chr1 201283451 201332993 NM_000299 0 + 201283702 201328836 0 15 453,104,395,145,208,178,63,115,156,177,154,187,85,107,2920, 0,10490,29714,33101,34120,35166,36364,36815,38526,39561,40976,41489,42302,45310,46622, chr1 67092165 67134970 NM_001276351 0 - 6709300467127240 0 8 1439,187,70,113,158,92,86,41, 0,3069,4086,23186,33586,35000,38976,42764, chr1 201283505 201332989 NM_001005337 0 + 201283702 201328836 0 14 399,104,395,145,208,178,115,156,177,154,187,85,107,2916, 0,10436,29660,33047,34066,35112,36761,38472,39507,40922,41435,42248,45256,46568, chr1 67092165 67134970 NM_001276352 0 - 6709357967127240 0 9 1439,70,145,68,113,158,92,86,41, 0,4086,11072,19411,23186,33586,35000,38976,42764,
Hello @Rpowellnz,
You certainly need to remove any of those HTML tags at the start. The BED file should be a plain text file with 3 or 4 tab-separated fields: chrom, start, end, <name> (<name> is optional). You should also remove those blank lines.
Hi @ArtRand
I removed the HTML tags so now $ head -n refseq1.bed produces the output below.
chr1 201283451 201332993 NM_000299 0 + 201283702 201328836 0 15 453,104,395,145,208,178,63,115,156,177,154,187,85,107,2920, 0,10490,29714,33101,34120,35166,36364,36815,38526,39561,40976,41489,42302,45310,46622, chr1 67092165 67134970 NM_001276351 0 - 67093004 67127240 0 8 1439,187,70,113,158,92,86,41, 0,3069,4086,23186,33586,35000,38976,42764, chr1 201283505 201332989 NM_001005337 0 + 201283702 201328836 0 14 399,104,395,145,208,178,115,156,177,154,187,85,107,2916, 0,10436,29660,33047,34066,35112,36761,38472,39507,40922,41435,42248,45256,46568, chr1 67092165 67134970 NM_001276352 0 - 67093579 67127240 0 9 1439,70,145,68,113,158,92,86,41, 0,4086,11072,19411,23186,33586,35000,38976,42764, chr1 67092165 67134970 NR_075077 0 - 67134970 67134970 0 10 1439,70,145,68,143,113,158,92,86,41, 0,4086,11072,19411,21448,23186,33586,35000,38976,42764,
Trying to run modkit dmr as below, still produces the error
./modkit dmr multi
-s barcode17.regions.bed.gz Tri102_1
-s barcode19.regions.bed.gz Tri102_2
-s barcode21.regions.bed.gz Tri103_1
-s barcode23.regions.bed.gz Tri103_2
-o $OUT_DIR
-r refseq1.bed
--ref $REF
-m C
--log-filepath dmr_multi.log
Error! Parsing Error: Error { input: "= {", code: Digit }
@Rpowellnz The latest version will report out which file is failing to parse. Could you confirm that it's an issue with the argument to -r (the regions file)?