methylpy icon indicating copy to clipboard operation
methylpy copied to clipboard

binom-test failed

Open Davieni4yashiro opened this issue 1 year ago • 14 comments

Hi yupeng, thanks for this powerful package. I ran methylpy 1.4.6 on a plant genome. Everything went smoothly except that binom-test failed.The error was like this:

Perform binomial test Fri Sep 9 03:23:16 2022

Traceback (most recent call last): File "/home/many/software/methylpy-1.2.9/bin/methylpy", line 5, in parse_args() File "/home/many/anaconda3/lib/python3.6/site-packages/methylpy/parser.py", line 185, in parse_args keep_temp_files=args.keep_temp_files) File "/home/many/anaconda3/lib/python3.6/site-packages/methylpy/call_mc_pe.py", line 342, in run_methylation_pipeline_pe keep_temp_files=keep_temp_files) File "/home/many/anaconda3/lib/python3.6/site-packages/methylpy/call_mc_pe.py", line 1373, in call_methylated_sites_pe keep_temp_files = keep_temp_files) File "/home/many/anaconda3/lib/python3.6/site-packages/methylpy/call_mc_se.py", line 1640, in call_methylated_sites remove_chr_prefix=remove_chr_prefix) File "/home/many/anaconda3/lib/python3.6/site-packages/methylpy/call_mc_se.py", line 2083, in perform_binomial_test chrom_pointer) File "/home/many/anaconda3/lib/python3.6/site-packages/methylpy/call_mc_se.py", line 2205, in calculate_non_conversion_rate mc += int(fields[4]) ValueError: invalid literal for int() with base 10: ''

However, another sample was done successfully, and the two samples shared the same pipeline and reference genome. I checked their allc files, and there is no difference in fields[4]. They look like this: image test.zip

So can you figure out what is wrong with my sample? Appreciate for your help.

Davieni4yashiro avatar Sep 09 '22 02:09 Davieni4yashiro

Hey, what command did you run for getting the error? Also, doing binomial test requires unmethylated control such as unmethylated lambda sequence. Please make sure that you have such control.

yupenghe avatar Sep 09 '22 05:09 yupenghe

Got it. Do you have chrC in the reference genome? Are reads mapped to this chromosome?

yupenghe avatar Sep 09 '22 05:09 yupenghe

binomial test is not required for DMRs and most analyses. So if you don't need it, you can do --binom-test False to get around this.

yupenghe avatar Sep 09 '22 05:09 yupenghe

yes,the chloroplast genome is included in my reference genome. And the idxstats of bam file looks fine. 图片

Davieni4yashiro avatar Sep 09 '22 05:09 Davieni4yashiro

Hmm weird. Can you grep the chrC lines in allc files and share it with me for the problematic sample?

yupenghe avatar Sep 09 '22 05:09 yupenghe

hi yupeng, I found that the last line of the problemmatic allc file is shifted, leaving a blank in fields[4] 图片 And when I delete the last line and run methylpy test-allc, binom-test work. So how can I prevent this from happening? 3 out of 4 samples had such problem. I attached the screen shots of their allc files. Thank you so much.

Davieni4yashiro avatar Sep 09 '22 07:09 Davieni4yashiro

pics.zip

Davieni4yashiro avatar Sep 09 '22 07:09 Davieni4yashiro

Is it always the last line that shows this issue?

yupenghe avatar Sep 09 '22 07:09 yupenghe

yes it is

Davieni4yashiro avatar Sep 09 '22 07:09 Davieni4yashiro

Can you send me the last line in a file? If possible, can you also send me the reads (in bam) that overlap the last CpG of chrC? I believe you can do something like

samtools view -b BAM_FILE chrC:156341-156345 > TEST.bam

yupenghe avatar Sep 09 '22 07:09 yupenghe

Hi, the test files are enclosed in the zip file. And for problematic samples, 2 BAM_FILEs were generated by methylpy paired-end-pipeline: no_clonal.bam and no_clonal.bam.read2flipped.bam. I did what you told me for both of the bamfiles. And I am so sorry that I had reckressly discarded the last line of my incorrect allc file. So I only put the right one in the file.

Thanks a lot!

test.zip

Davieni4yashiro avatar Sep 09 '22 08:09 Davieni4yashiro

One more thing needed to test this. Do you mind to send me the chrC sequence you are using?

In case this is helpful, here is a command to get sequence of just one chromosome.

samtools faidx GENOME.fa chrC > chrC.fa

yupenghe avatar Sep 10 '22 03:09 yupenghe

chrC.fa.zip Thanks!

Davieni4yashiro avatar Sep 10 '22 03:09 Davieni4yashiro

That is weird. I was not able to reproduce the error. If you do --unmethylated-control C: instead of chrC, will it help?

methylpy paired-end-pipeline
--read1-files cleaned_DH-1_R1.fastq.gz
--read2-files cleaned_DH-1_R2.fastq.gz
--sample cleaned_DH-1
--forward-ref AcDH_f
--reverse-ref AcDH_r
--ref-fasta clean.AcDH.fasta
--num-procs 52
--remove-clonal True
--path-to-output DH-1_output/
--sort-mem 50G
--unmethylated-control C:
--binom-test True
--path-to-picard /home/many/software

yupenghe avatar Sep 10 '22 04:09 yupenghe

It does help! Thanks so much!

Davieni4yashiro avatar Oct 29 '22 12:10 Davieni4yashiro