varsim icon indicating copy to clipboard operation
varsim copied to clipboard

VarSim germline understanding output...

Open iranmdl opened this issue 6 years ago • 0 comments

Hi there! I'm trying to make a somatic simulation. First, I have simulated germline data as suggested:

python varsim.py --vc_in_vcf All.vcf.gz --reference hs37d5.fa --sv_insert_seq insert_seq.txt --sv_dgv GRCh37_hg19_supportingvariants_2013-07-23.txt --id simu --read_length 100 --vc_num_snp 20000 --vc_num_ins 5000 --vc_num_del 5000 --vc_num_mnp 100 --vc_num_complex 50 --sv_num_ins 100 --sv_num_del 100 --sv_num_dup 100 --sv_num_inv 100 --sv_percent_novel 0 --mean_fragment_size 350 --sd_fragment_size 50 --vc_min_length_lim 0 --vc_max_length_lim 49 --sv_min_length_lim 50 --sv_max_length_lim 1000000 --nlanes 3 --total_coverage 30 --simulator art --simulator_executable /shared/tools/src/art_bin_MountRainier/art_illumina --out_dir germline_out_25082017 --log_dir germline_log_25082017

I was expecting to have in my output a ground truth VCF file with: 20000 SNPs, 5000 insertions, 5000 deletions, 100 mnp, 50 complex, 100 big insertions, 100 big deletions, 100 duplications and 100 inversions. I assume that simu.truth.vcf is the ground truth output VCF file, right? When I look into it.. I can see only 384 variants, all of them structural variants. Where does this number come from? Where are the small variations?

What is this random.vc.vcf file and why is empty?

What is random.sv.vcf file and why do I have 400 variants in it?

Is out there any manual explaining each file?

I'll come back with somatic questions after but first I need to understand germline output. Any help would be appreciated, Thank you in advance, :)

iranmdl avatar Sep 20 '17 16:09 iranmdl