modkit icon indicating copy to clipboard operation
modkit copied to clipboard

modkit dmr - failed to read tabix index

Open lilypeck opened this issue 1 year ago • 6 comments

Hello

mod_kit 0.2.8

I am getting a very basic error:

> Error! failed to read tabix index "barcode05_E_CHH.bedmethyl.gz.tbi"
>  caused by invalid reference sequence names
>  caused by expected EOF

My script is:

/u/home/l/ldpeck/project-vlsork/longreads/dist/modkit dmr pair \
  -a ${dmrD_1}_${context}.bedmethyl.gz \
  -a ${dmrD_2}_${context}.bedmethyl.gz \
  -b ${dmrC_1}_${context}.bedmethyl.gz \
  -b ${dmrC_2}_${context}.bedmethyl.gz \
  -o dmr/dmp_${run}_${context}.tab \
  --ref /u/home/l/ldpeck/genome_resources/GCF_001633185.2_ValleyOak3.2_genomic.fna \
  --base C \
  -t 24 \
  -f \
  --log-filepath dmr/dmp_${run}.log

However I don't think the error is related to my .tbi files, because I have re-run a script that previously successfully completed with mod_kit 0.2.7, and it now fails with this error for mod_kit 0.2.8.

Is there something you can see that might be causing this?

Thank you in advance!

Lily

lilypeck avatar May 02 '24 10:05 lilypeck

Hello @lilypeck,

There shouldn't be any changes in modkit v0.2.7 to v0.2.8 with respect to how the tabix index is handled. However, I did update the dependencies that modkit uses, so it's possible that it picked up a bug. Could you?

  1. Check if v0.2.7 works on the same input.
  2. Attach the tabix index that is failing to this thread so I can investigate what the problem is.

Thanks.

ArtRand avatar May 02 '24 13:05 ArtRand

Hello @ArtRand Thank you for your response! I have just checked with v0.2.7 and I don't get the tabix file error -

> reading reference FASTA at "/u/home/l/ldpeck/genome_resources/GCF_001633185.2_ValleyOak3.2_genomic.fna"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 4139233 a records and 4027045 b records, calculating max coverages for 95th percentile
> calculated max coverage for a: 24 and b: 30
> running with replicates and matched samples

I have attached two .tbi files which failed. I have also checked the bedmethyls.gz and they are complete (with the same tail output as the uncompressed versions).

Thanks

Lily

barcode21_U_CG.bedmethyl.gz.tbi.txt barcode21_U_CHG.bedmethyl.gz.tbi.txt

lilypeck avatar May 02 '24 13:05 lilypeck

Hello @lilypeck,

I was able to reproduce the error using noodles version 0.69.0 (the version in modkit 0.2.8), the error does not occur with version 0.50.0 (the version in modkit 0.2.7). What is strange, however, is that the tabix indices that I have in tests and some others I've used seem to be parsed without complaint. Could you tell me what version of tabix you have? This is what I have tested:

tabix --version
tabix (htslib) 1.18
Copyright (C) 2023 Genome Research Ltd.

If you give me a few minutes I can get you a build with the older version of the library to unblock your work, but I'd like to get to the bottom of the problem also. So to summarize, please:

  1. Tell me the version of tabix you have and if you could show me the script you're using.
  2. (If it's not too large) send me one of the bgzipped bedmethyl files.
  3. If this ends up being a noodles bug, I'd like to open an issue with the noodles developers, could you give me permission to use your file as an example to exercise the bug?

ArtRand avatar May 02 '24 15:05 ArtRand

Hello @ArtRand Thank you very much! Tabix is:

tabix (htslib) 1.19.1
Copyright (C) 2024 Genome Research Ltd.

The complete .bedmethyl is too big to upload, so I have uploaded the first 1m lines. Or if you have an email address I could send you a copy? And yes very happy for you to use these to exercise the bug.

Thank you very much for your help.

Lily barcode21_U_CHH.bedmethyl.head.gz

lilypeck avatar May 02 '24 17:05 lilypeck

Hello @lilypeck,

Alright, I've made a branch (build attached) where I've changed the version back. Please let me know if this build works. I'm going to investigate why the later versions don't work with tabix 1.19.1. Thanks for permission to use your files as well.

modkit_dev9c754d4c_centos7_x86_64.tar.gz

ArtRand avatar May 02 '24 21:05 ArtRand

Hi @ArtRand Thank you so much it is working now! Lily

lilypeck avatar May 03 '24 07:05 lilypeck

@lilypeck Great, this fix is also in the 0.4.1 release.

ArtRand avatar Sep 30 '24 14:09 ArtRand