rvtests icon indicating copy to clipboard operation
rvtests copied to clipboard

Empty individual column - very strange!!

Open jielab opened this issue 7 years ago • 3 comments

Hi, Xiaowei:

I got the following error for an imputed chunk. I was able to use "bcftools -l" to extract the sample ID columns without any problem. So, exactly what RVTESTS is complaining about?

BTW, I found that the new version of RVTESTS increased the run time from 25247 seconds to 33770 seconds, for one of my chunks, after you fixed the AF issue. So, are there any other features added that make the software run slow?

Thank you & best regards, Jie

capture

jielab avatar Apr 21 '17 09:04 jielab

Hi Jie,

This message means that some lines in your VCF file have inconsistent number of columns. For example,

Tab "\t" is denoted by "_"

P1_P2_P3 0/0_0/1_1/1 0/0__1/1 <- this line have any empty column, as there are two consecutive tabs in the middle. 0/0_0/1_1/1

Maybe you can validate the input VCF file using this tool: https://github.com/zhanxw/checkVCF

To find out the why running time is increased 33% on your side, I did check the source codes. As the changes in this version is very small, I guess the longer running time may be due to some other factors. e.g. could your server be busier in the second benchmark ?

zhanxw avatar Apr 21 '17 14:04 zhanxw

Thanks, Xiaowei.

I repeated the analysis and it took 33770 seconds again. It seems that the program takes the same amount of time, no matter when i run it and whether our cluster is busy or not. It would be good if you could test run a dataset with the previous and current version of RVTESTS to find out the running time difference.

I now counted the rows of the VCF file that is giving the above error. Indeed, some row don't have the same number of samples. Maybe the file is corrupted.

Best regards, jie

jielab avatar Apr 21 '17 20:04 jielab

Thanks for getting back to me.

I will check the running time for rvtests.

zhanxw avatar Apr 26 '17 22:04 zhanxw