varsim icon indicating copy to clipboard operation
varsim copied to clipboard

Varsim build successfully but python packages aren't loadable

Open gtollefson opened this issue 2 years ago • 12 comments

Hi,

I've downloaded and built VarSim following the instructions on the documentation page. However, when I run the following code on my reference genome (I am using the hg38 reference genome, not the fasta included in VarSim), I get an error message stating that certain python modules are not installed. Since there were no errors during the build step, I'm concerned that there is an issue with the installation of the python dependencies which is not reported as a failure in the build step. Can you help me to troubleshoot this issue?

The error message I see in the somatic output log:

Traceback (most recent call last): File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/varsim_somatic.py", line 13, in from varsim import monitor_processes, check_executable, run_vcfstats, run_randvcf, RandVCFOptions File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/varsim.py", line 17, in from liftover_restricted_vcf_map import lift_vcfs, lift_maps File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/liftover_restricted_vcf_map.py", line 6, in import vcf ImportError: No module named vcf

My command:

varsim_somatic.py --reference hg38.fa --id cosmic --som_num_snp 10 \
        --som_num_ins 5 --som_num_del 5 \
        --som_num_mnp 0 \
        --som_num_complex 1 \
        --cosmic_vcf CosmicCodingMuts.vcf \
        --normal_vcf out/simu.truth.vcf \
        --nlanes 5 --total_coverage 30 \
        --simulator art \
        --simulator_executable /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/simulated_sequence_data/ART/art_src_MountRainier_Linux/art_illumina \
        --out_dir /som_out --log_dir som_log --work_dir som_work &> somatic.log

based off of the somatic simulation in the docs:

varsim_somatic.py --reference hs37d5.fa --id cosmic --som_num_snp 10000 \
        --som_num_ins 2000 --som_num_del 2000 \
        --som_num_mnp 200 \
        --som_num_complex 200 \
        --cosmic_vcf cosmic.vcf.gz \
        --normal_vcf out/simu.truth.vcf \
        --nlanes 5 --total_coverage 1 \
        --simulator art \
        --simulator_executable ART/art_bin_VanillaIceCream/art_illumina \
        --out_dir som_out --log_dir som_log --work_dir som_work &> somatic.log

Thank you very much. George Tollefson

gtollefson avatar Sep 02 '21 20:09 gtollefson

Hi George,

Are you using master branch? And when you built varsim, did you have your own python virtual environment or conda environment? Such environments must be cleared (with re-login) before installation.

Best, Yunfei

George Tollefson @.***>于2021年9月2日 周四下午1:54写道:

Hi,

I've downloaded and built VarSim following the instructions on the documentation page. However, when I run the following code on my reference genome (I am using the hg38 reference genome, not the fasta included in VarSim), I get an error message stating that certain python modules are not installed. Since there were no errors during the build step, I'm concerned that there is an issue with the installation of the python dependencies which is not reported as a failure in the build step. Can you help me to troubleshoot this issue?

The error message I see in the somatic output log:

Traceback (most recent call last): File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/varsim_somatic.py", line 13, in from varsim import monitor_processes, check_executable, run_vcfstats, run_randvcf, RandVCFOptions File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/varsim.py", line 17, in from liftover_restricted_vcf_map import lift_vcfs, lift_maps File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/liftover_restricted_vcf_map.py", line 6, in import vcf ImportError: No module named vcf

My command:

varsim_somatic.py --reference hg38.fa --id cosmic --som_num_snp 10
--som_num_ins 5 --som_num_del 5
--som_num_mnp 0
--som_num_complex 1
--cosmic_vcf CosmicCodingMuts.vcf
--normal_vcf out/simu.truth.vcf
--nlanes 5 --total_coverage 30
--simulator art
--simulator_executable /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/simulated_sequence_data/ART/art_src_MountRainier_Linux/art_illumina
--out_dir /som_out --log_dir som_log --work_dir som_work &> somatic.log

based off of the somatic simulation in the docs:

varsim_somatic.py --reference hs37d5.fa --id cosmic --som_num_snp 10000
--som_num_ins 2000 --som_num_del 2000
--som_num_mnp 200
--som_num_complex 200
--cosmic_vcf cosmic.vcf.gz
--normal_vcf out/simu.truth.vcf
--nlanes 5 --total_coverage 1
--simulator art
--simulator_executable ART/art_bin_VanillaIceCream/art_illumina
--out_dir som_out --log_dir som_log --work_dir som_work &> somatic.log

Thank you very much. George Tollefson

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bioinform/varsim/issues/260, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3A7TTB2O7PIRW7OR6OWMTT77QATANCNFSM5DJ57OJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Best, Yunfei Bioinformatics Research and Early Development (bRED) Roche Sequencing Solutions, based in Massachusetts, USA

yunfeiguo avatar Sep 03 '21 11:09 yunfeiguo

Hi @yunfeiguo,

Thank you very much for the fast response. I built varsim using the following commands. Does a newly created virtual environment need to be cleared with re-login before installation?

# create and activate dedicated virtual environment
virtualenv my_varsim
source my_varsim/bin/activate

# download and build varsim
git clone https://github.com/bioinform/varsim.git
cd varsim
./build.sh

gtollefson avatar Sep 03 '21 13:09 gtollefson

Varsim build does not support a virtual environment. Most python based scripts will know which python interpreter to use with no additional configuration.

George Tollefson @.***>于2021年9月3日 周五上午6:28写道:

Hi @yunfeiguo https://github.com/yunfeiguo,

Thank you very much for the fast response. I built varsim using the following commands. Does a newly created virtual environment need to be cleared with re-login before installation?

create and activate dedicated virtual environment

virtualenv my_varsim source my_varsim/bin/activate

download and build varsim

git clone https://github.com/bioinform/varsim.git cd varsim ./build.sh

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/bioinform/varsim/issues/260#issuecomment-912540122, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3A7TXQIXWEQOCL4GZ6I7DUADEN7ANCNFSM5DJ57OJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Best, Yunfei Bioinformatics Research and Early Development (bRED) Roche Sequencing Solutions, based in Massachusetts, USA

yunfeiguo avatar Sep 04 '21 12:09 yunfeiguo

@yunfeiguo Ok I see! I've built varsim again, this time in a fresh session without activating a virtual environment. This time when I run the quickstart script it fails to find samtools even though I've loaded samtools/1.9 as a module already on our Unix computing cluster. I tried running the quickstart script without loading the samtools module on our computing cluster and it returned the same error, pasted below.

This could to be the cause:

/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../opt/samtools-1.9_install/samtools faidx hs37d5.fa ./quickstart.sh: line 24: /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../opt/samtools-1.9_install/samtools: No such file or directory

Full error report:

[gtollefs@node1315 quickstart_test]$ ./quickstart.sh +++ dirname ./quickstart.sh ++ cd . ++ pwd

  • DIR=/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test
  • OPT_DIR=/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../opt
  • WD=
  • : /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test
  • echo running in /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test running in /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test
  • pushd /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test
  • b37_source=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
  • [[ ! -f hs37d5.fa ]]
  • [[ ! -f insert_seq.txt ]]
  • [[ ! -f GRCh37_hg19_supportingvariants_2013-07-23.txt ]]
  • /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../opt/samtools-1.9_install/samtools faidx hs37d5.fa ./quickstart.sh: line 24: /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../opt/samtools-1.9_install/samtools: No such file or directory
  • export PATH=/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../opt/jdk1.8.0_131/bin:/gpfs/runtime/opt/samtools/1.9/bin:/gpfs/runtime/opt/gsl/2.3/bin:/users/gtollefs/perl5/bin:/gpfs/runtime/opt/intel/2017.0/bin:/gpfs/runtime/opt/python/2.7.12/bin:/gpfs/runtime/opt/matlab/R2017b/bin:/gpfs/runtime/opt/java/8u111/bin:/gpfs/runtime/opt/anaconda/2020.02/condabin:/users/gtollefs/perl5/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/lpp/mmfs/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/gtollefs/bin
  • PATH=/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../opt/jdk1.8.0_131/bin:/gpfs/runtime/opt/samtools/1.9/bin:/gpfs/runtime/opt/gsl/2.3/bin:/users/gtollefs/perl5/bin:/gpfs/runtime/opt/intel/2017.0/bin:/gpfs/runtime/opt/python/2.7.12/bin:/gpfs/runtime/opt/matlab/R2017b/bin:/gpfs/runtime/opt/java/8u111/bin:/gpfs/runtime/opt/anaconda/2020.02/condabin:/users/gtollefs/perl5/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/lpp/mmfs/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/gtollefs/bin
  • /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../varsim.py --vc_in_vcf 21_5_10Mb.vcf.gz --sv_insert_seq insert_seq.txt --sv_dgv GRCh37_hg19_supportingvariants_2013-07-23.txt --reference hs37d5.fa --id simu --read_length 100 --vc_num_snp 300 --vc_num_ins 10 --vc_num_del 10 --vc_num_mnp 5 --vc_num_complex 5 --sv_num_ins 2000 --sv_num_del 2000 --sv_num_dup 200 --sv_num_inv 1000 --sv_percent_novel 0.01 --vc_percent_novel 0.01 --mean_fragment_size 350 --sd_fragment_size 50 --vc_min_length_lim 0 --vc_max_length_lim 49 --sv_min_length_lim 50 --sv_max_length_lim 1000000 --nlanes 3 --total_coverage 1 --java_max_mem 50g --simulator_executable /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../opt/ART/art_bin_VanillaIceCream/art_illumina --out_dir out --log_dir log --work_dir work --simulator art Traceback (most recent call last): File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 721, in java=args.java) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 335, in varsim_main run_randvcf(os.path.realpath(sampling_vcf), rand_vcf_out_fd, rand_vcf_log_fd, seed, sex, randvcf_options, reference, sample_id, java) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 243, in run_randvcf run_shell_command(rand_vcf_command, cmd_stdout=out_vcf_fd, cmd_stderr=log_file_fd) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/utils.py", line 92, in run_shell_command raise Exception('{0} failed'.format(cmd)) Exception: /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/opt/jdk1.8.0_131/bin/java -Xmx50g -jar /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/VarSim.jar randvcf2vcf -seed 0 -t MALE -num_snp 300 -num_ins 10 -num_del 10 -num_mnp 5 -num_complex 5 -num_dup 0 -num_inv 0 -novel 0.01 -min_len 0 -max_len 49 -prop_het 0.6 -ref /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/hs37d5.fa -id 'simu' -vcf /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/tests/quickstart_test/21_5_10Mb.vcf.gz failed

gtollefson avatar Sep 04 '21 17:09 gtollefson

Hi George, quickstart.sh only uses the samtools in VarSim's opt/ folder. Please check your samtools installation at opt/samtools-1.9_install/samtools inside VarSim root directory, if not installed, you need to remove opt/samtools-1.9_install and opt/miniconda2 folders, then re-run build.sh.

yunfeiguo avatar Sep 05 '21 12:09 yunfeiguo

Hi @yunfeiguo,

I think I see the issue. Samtools is installed, but the path to samtools in the quickstart.sh script seems to be missing a directory. Samtools was installed by build.sh in the path: /varsim/opt/samtools-1.9_install/bin/, but quickstart.sh is looking for samtools in /varsim/opt/samtools-1.9_install/. When I look inside the /opt/samtools-1.9_install/bin/ directory I see these contents:

[gtollefs@login005 samtools-1.9_install]$ pwd /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/opt/samtools-1.9_install [gtollefs@login005 samtools-1.9_install]$ ls bin [gtollefs@login005 samtools-1.9_install]$ cd bin/ [gtollefs@login005 bin]$ ls -ltr -h total 3.0K lrwxrwxrwx 1 gtollefs auzun 100 Sep 2 15:08 samtools -> /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/opt/miniconda2/bin/samtools lrwxrwxrwx 1 gtollefs auzun 97 Sep 2 15:08 bgzip -> /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/opt/miniconda2/bin/bgzip lrwxrwxrwx 1 gtollefs auzun 97 Sep 2 15:08 tabix -> /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/tpp1_splicing_project/varsim/varsim/opt/miniconda2/bin/tabix [gtollefs@login005 bin]$

gtollefson avatar Sep 07 '21 17:09 gtollefson

You are right. Thanks for pointing it out. I fixed this issue. Please pull from master again.

yunfeiguo avatar Sep 07 '21 18:09 yunfeiguo

HI @yunfeiguo,

Thank you very much for your continued support. I pulled from master again and rebuilt successfully. This time when I run the quickstart test, it runs further along but does not complete - it fails upon attempting to build the fai index after saving the GRCh37_hg19_supportingvariants_2013-07-23.txt file.

...

467050K .......... .......... .......... .......... .......... 99% 1.19M 0s 467100K .......... .......... .......... .......... .......... 99% 3.53M 0s 467150K .......... .......... .......... .......... .......... 99% 1.49M 0s 467200K .......... .......... .......... .......... .......... 99% 1.20M 0s 467250K .......... .......... .......... .......... .......... 99% 2.94M 0s 467300K .......... .......... .......... .......... .......... 99% 1.42M 0s 467350K .......... .......... .......... .......... .......... 99% 1.22M 0s 467400K .......... .......... .......... .......... .......... 99% 1.41M 0s 467450K .......... .......... .......... .......... .......... 99% 1.21M 0s 467500K .. 100% 2.89M=6m20s

2021-09-16 11:54:47 (1.20 MB/s) - ‘GRCh37_hg19_supportingvariants_2013-07-23.txt’ saved [478722547/478722547]

+ /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/samtools-1.9_install/bin/samtools faidx hs37d5.fa [faidx] Could not build fai index hs37d5.fa.fai + export PATH=/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/jdk1.8.0_131/bin:/gpfs/runtime/opt/anaconda/2020.02/condabin:/users/gtollefs/perl5/bin:/gpfs/runtime/opt/intel/2017.0/bin:/gpfs/runtime/opt/python/2.7.12/bin:/gpfs/runtime/opt/matlab/R2017b/bin:/gpfs/runtime/opt/java/8u111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/lpp/mmfs/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/gtollefs/bin + PATH=/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/jdk1.8.0_131/bin:/gpfs/runtime/opt/anaconda/2020.02/condabin:/users/gtollefs/perl5/bin:/gpfs/runtime/opt/intel/2017.0/bin:/gpfs/runtime/opt/python/2.7.12/bin:/gpfs/runtime/opt/matlab/R2017b/bin:/gpfs/runtime/opt/java/8u111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/lpp/mmfs/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/gtollefs/bin + /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../varsim.py --vc_in_vcf 21_5_10Mb.vcf.gz --sv_insert_seq insert_seq.txt --sv_dgv GRCh37_hg19_supportingvariants_2013-07-23.txt --reference hs37d5.fa --id simu --read_length 100 --vc_num_snp 300 --vc_num_ins 10 --vc_num_del 10 --vc_num_mnp 5 --vc_num_complex 5 --sv_num_ins 2000 --sv_num_del 2000 --sv_num_dup 200 --sv_num_inv 1000 --sv_percent_novel 0.01 --vc_percent_novel 0.01 --mean_fragment_size 350 --sd_fragment_size 50 --vc_min_length_lim 0 --vc_max_length_lim 49 --sv_min_length_lim 50 --sv_max_length_lim 1000000 --nlanes 3 --total_coverage 1 --java_max_mem 50g --simulator_executable /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/ART/art_bin_VanillaIceCream/art_illumina --out_dir out --log_dir log --work_dir work --simulator art Traceback (most recent call last): File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 721, in java=args.java) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 335, in varsim_main run_randvcf(os.path.realpath(sampling_vcf), rand_vcf_out_fd, rand_vcf_log_fd, seed, sex, randvcf_options, reference, sample_id, java) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 243, in run_randvcf run_shell_command(rand_vcf_command, cmd_stdout=out_vcf_fd, cmd_stderr=log_file_fd) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/utils.py", line 92, in run_shell_command raise Exception('{0} failed'.format(cmd)) Exception: /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/opt/jdk1.8.0_131/bin/java -Xmx50g -jar /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/VarSim.jar randvcf2vcf -seed 0 -t MALE -num_snp 300 -num_ins 10 -num_del 10 -num_mnp 5 -num_complex 5 -num_dup 0 -num_inv 0 -novel 0.01 -min_len 0 -max_len 49 -prop_het 0.6 -ref /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/hs37d5.fa -id 'simu' -vcf /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/21_5_10Mb.vcf.gz failed

gtollefson avatar Sep 16 '21 16:09 gtollefson

Is your hs37d5.fa intact? What does it show if you run /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/samtools-1.9_install/bin/samtools faidx hs37d5.fa?

I also saw failure for VarSim.jar randvcf2vcf, could you show what's in log/RandVCF2VCF.err?

Best, Yunfei

On Thu, Sep 16, 2021 at 9:18 AM George Tollefson @.***> wrote:

HI @yunfeiguo https://github.com/yunfeiguo,

Thank you very much for your continued support. I pulled from master again and rebuilt successfully. This time when I run the quickstart test, it runs further along but does not complete - it fails upon attempting to build the fai index after saving the GRCh37_hg19_supportingvariants_2013-07-23.txt file.

...

467050K .......... .......... .......... .......... .......... 99% 1.19M 0s 467100K .......... .......... .......... .......... .......... 99% 3.53M 0s 467150K .......... .......... .......... .......... .......... 99% 1.49M 0s 467200K .......... .......... .......... .......... .......... 99% 1.20M 0s 467250K .......... .......... .......... .......... .......... 99% 2.94M 0s 467300K .......... .......... .......... .......... .......... 99% 1.42M 0s 467350K .......... .......... .......... .......... .......... 99% 1.22M 0s 467400K .......... .......... .......... .......... .......... 99% 1.41M 0s 467450K .......... .......... .......... .......... .......... 99% 1.21M 0s 467500K .. 100% 2.89M=6m20s

2021-09-16 11:54:47 (1.20 MB/s) - ‘GRCh37_hg19_supportingvariants_2013-07-23.txt’ saved [478722547/478722547]

  • /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/samtools-1.9_install/bin/samtools faidx hs37d5.fa [faidx] Could not build fai index hs37d5.fa.fai
  • export PATH=/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/jdk1.8.0_131/bin:/gpfs/runtime/opt/anaconda/2020.02/condabin:/users/gtollefs/perl5/bin:/gpfs/runtime/opt/intel/2017.0/bin:/gpfs/runtime/opt/python/2.7.12/bin:/gpfs/runtime/opt/matlab/R2017b/bin:/gpfs/runtime/opt/java/8u111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/lpp/mmfs/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/gtollefs/bin

PATH=/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/jdk1.8.0_131/bin:/gpfs/runtime/opt/anaconda/2020.02/condabin:/users/gtollefs/perl5/bin:/gpfs/runtime/opt/intel/2017.0/bin:/gpfs/runtime/opt/python/2.7.12/bin:/gpfs/runtime/opt/matlab/R2017b/bin:/gpfs/runtime/opt/java/8u111/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/usr/lpp/mmfs/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/gtollefs/bin

  • /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../varsim.py --vc_in_vcf 21_5_10Mb.vcf.gz --sv_insert_seq insert_seq.txt --sv_dgv GRCh37_hg19_supportingvariants_2013-07-23.txt --reference hs37d5.fa --id simu --read_length 100 --vc_num_snp 300 --vc_num_ins 10 --vc_num_del 10 --vc_num_mnp 5 --vc_num_complex 5 --sv_num_ins 2000 --sv_num_del 2000 --sv_num_dup 200 --sv_num_inv 1000 --sv_percent_novel 0.01 --vc_percent_novel 0.01 --mean_fragment_size 350 --sd_fragment_size 50 --vc_min_length_lim 0 --vc_max_length_lim 49 --sv_min_length_lim 50 --sv_max_length_lim 1000000 --nlanes 3 --total_coverage 1 --java_max_mem 50g --simulator_executable /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../opt/ART/art_bin_VanillaIceCream/art_illumina --out_dir out --log_dir log --work_dir work --simulator art Traceback (most recent call last): File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 721, in java=args.java) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 335, in varsim_main run_randvcf(os.path.realpath(sampling_vcf), rand_vcf_out_fd, rand_vcf_log_fd, seed, sex, randvcf_options, reference, sample_id, java) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/../../varsim.py", line 243, in run_randvcf run_shell_command(rand_vcf_command, cmd_stdout=out_vcf_fd, cmd_stderr=log_file_fd) File "/gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/utils.py", line 92, in run_shell_command raise Exception('{0} failed'.format(cmd)) Exception: /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/opt/jdk1.8.0_131/bin/java -Xmx50g -jar /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/VarSim.jar randvcf2vcf -seed 0 -t MALE -num_snp 300 -num_ins 10 -num_del 10 -num_mnp 5 -num_complex 5 -num_dup 0 -num_inv 0 -novel 0.01 -min_len 0 -max_len 49 -prop_het 0.6 -ref /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/hs37d5.fa -id 'simu' -vcf /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/21_5_10Mb.vcf.gz failed

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bioinform/varsim/issues/260#issuecomment-921044472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3A7TV74NUSSZ77TZ2BIH3UCIKGDANCNFSM5DJ57OJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

yunfeiguo avatar Sep 17 '21 00:09 yunfeiguo

It looks like the hs37d5.fa file is empty. Perhaps there was an error while retrieving or unpacking it. Our system does not have bgzip installed, rather we use gunzip. If the default unzipper is bgzip that step may be failing.

The output in log/RandVCF2VCF.err is:

[gtollefs@login006 log]$ more RandVCF2VCF.err INFO 2021-09-16 14:59:05 CreateSequenceDictionary Output dictionary will be written in /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests /quickstart_test/hs37d5.dict 16 Sep 2021 14:59:18,734 INFO [main] (VCFparser.java:55): Reading /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/21_5_10Mb.vcf.gz 16 Sep 2021 14:59:18,737 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,737 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,739 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,739 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,739 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 WARN [main] (VCFparser.java:219): Warning!!! ID (simu) does not exist... 16 Sep 2021 14:59:18,859 WARN [main] (VCFparser.java:350): ALT column is malformated: Found illegal character: 46 16 Sep 2021 14:59:18,859 WARN [main] (VCFparser.java:739): Returned null variant for line 21 9437982 rs758390996 T . . . RS=758390996;RSPOS=9437983;db SNPBuildID=144;SSR=0;SAO=0;VP=0x050000000005000002000200;WGT=1;VC=DIV;ASP 16 Sep 2021 14:59:19,294 WARN [main] (VCFparser.java:350): ALT column is malformated: Found illegal character: 46 16 Sep 2021 14:59:19,294 WARN [main] (VCFparser.java:739): Returned null variant for line 21 9704331 rs770995567 C . . . RS=770995567;RSPOS=9704332;db SNPBuildID=144;SSR=0;SAO=0;VP=0x050000000005000002000200;WGT=1;VC=DIV;ASP 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:147): total_num_SNP: 154245 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:148): total_num_INS: 4011 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:149): total_num_DEL: 9828 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:150): total_num_MNP: 95 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:151): total_num_DUP: null 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:152): total_num_INV: 0 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:153): total_num_COMPLEX: 38 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:154): total_num_skipped: 91629 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:155): total_num: 162651 16 Sep 2021 14:59:19,727 INFO [main] (RandVCF2VCF.java:164): Writing sampled variant file 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:55): Reading /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/21_5_10Mb.vcf.gz 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,731 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,731 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,731 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,731 WARN [main] (VCFparser.java:219): Warning!!! ID (simu) does not exist... 16 Sep 2021 14:59:19,819 WARN [main] (VCFparser.java:350): ALT column is malformated: Found illegal character: 46 16 Sep 2021 14:59:19,819 WARN [main] (VCFparser.java:739): Returned null variant for line 21 9437982 rs758390996 T . . . RS=758390996;RSPOS=9437983;db SNPBuildID=144;SSR=0;SAO=0;VP=0x050000000005000002000200;WGT=1;VC=DIV;ASP 16 Sep 2021 14:59:19,918 WARN [main] (Variant.java:323): Same ref at alt at 36744531 to 36744532 16 Sep 2021 14:59:19,918 WARN [main] (Variant.java:312): N found at 8847573 to 8847574 16 Sep 2021 14:59:20,196 WARN [main] (Variant.java:312): N found at 3568220 to 3568221 16 Sep 2021 14:59:20,372 WARN [main] (VCFparser.java:350): ALT column is malformated: Found illegal character: 46 16 Sep 2021 14:59:20,372 WARN [main] (VCFparser.java:739): Returned null variant for line 21 9704331 rs770995567 C . . . RS=770995567;RSPOS=9704332;db SNPBuildID=144;SSR=0;SAO=0;VP=0x050000000005000002000200;WGT=1;VC=DIV;ASP 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 28433115 to 28433116 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 47717396 to 47717397 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:312): N found at 4492798 to 4492799 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 32656322 to 32656323 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 10414698 to 10414699 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 31975147 to 31975148 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 45254136 to 45254137 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 24919733 to 24919734 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 16239803 to 16239804

gtollefson avatar Sep 17 '21 19:09 gtollefson

Actually gunzip is used to decompress. But wget is used to download the genome. If you don't have wget, you can manually download and decompress the genome here ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

Best, Yunfei

On Fri, Sep 17, 2021 at 12:12 PM George Tollefson @.***> wrote:

It looks like the hs37d5.fa file is empty. Perhaps there was an error while retrieving or unpacking it. Our system does not have bgzip installed, rather we use gunzip. If the default unzipper is bgzip that step may be failing.

The output in log/RandVCF2VCF.err is:

@.*** log]$ more RandVCF2VCF.err INFO 2021-09-16 14:59:05 CreateSequenceDictionary Output dictionary will be written in /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests /quickstart_test/hs37d5.dict 16 Sep 2021 14:59:18,734 INFO [main] (VCFparser.java:55): Reading /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/21_5_10Mb.vcf.gz 16 Sep 2021 14:59:18,737 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,737 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,738 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,739 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,739 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,739 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,740 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,741 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:18,742 WARN [main] (VCFparser.java:219): Warning!!! ID (simu) does not exist... 16 Sep 2021 14:59:18,859 WARN [main] (VCFparser.java:350): ALT column is malformated: Found illegal character: 46 16 Sep 2021 14:59:18,859 WARN [main] (VCFparser.java:739): Returned null variant for line 21 9437982 rs758390996 T . . . RS=758390996;RSPOS=9437983;db SNPBuildID=144;SSR=0;SAO=0;VP=0x050000000005000002000200;WGT=1;VC=DIV;ASP 16 Sep 2021 14:59:19,294 WARN [main] (VCFparser.java:350): ALT column is malformated: Found illegal character: 46 16 Sep 2021 14:59:19,294 WARN [main] (VCFparser.java:739): Returned null variant for line 21 9704331 rs770995567 C . . . RS=770995567;RSPOS=9704332;db SNPBuildID=144;SSR=0;SAO=0;VP=0x050000000005000002000200;WGT=1;VC=DIV;ASP 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:147): total_num_SNP: 154245 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:148): total_num_INS: 4011 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:149): total_num_DEL: 9828 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:150): total_num_MNP: 95 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:151): total_num_DUP: null 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:152): total_num_INV: 0 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:153): total_num_COMPLEX: 38 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:154): total_num_skipped: 91629 16 Sep 2021 14:59:19,726 INFO [main] (RandVCF2VCF.java:155): total_num: 162651 16 Sep 2021 14:59:19,727 INFO [main] (RandVCF2VCF.java:164): Writing sampled variant file 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:55): Reading /gpfs/data/dgamsiz/Uzun_Lab/gtollefs/indel_detection_project/varsim/varsim/tests/quickstart_test/21_5_10Mb.vcf.gz 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,728 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,729 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,730 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,731 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,731 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,731 INFO [main] (VCFparser.java:209): Reading header line. 16 Sep 2021 14:59:19,731 WARN [main] (VCFparser.java:219): Warning!!! ID (simu) does not exist... 16 Sep 2021 14:59:19,819 WARN [main] (VCFparser.java:350): ALT column is malformated: Found illegal character: 46 16 Sep 2021 14:59:19,819 WARN [main] (VCFparser.java:739): Returned null variant for line 21 9437982 rs758390996 T . . . RS=758390996;RSPOS=9437983;db SNPBuildID=144;SSR=0;SAO=0;VP=0x050000000005000002000200;WGT=1;VC=DIV;ASP 16 Sep 2021 14:59:19,918 WARN [main] (Variant.java:323): Same ref at alt at 36744531 to 36744532 16 Sep 2021 14:59:19,918 WARN [main] (Variant.java:312): N found at 8847573 to 8847574 16 Sep 2021 14:59:20,196 WARN [main] (Variant.java:312): N found at 3568220 to 3568221 16 Sep 2021 14:59:20,372 WARN [main] (VCFparser.java:350): ALT column is malformated: Found illegal character: 46 16 Sep 2021 14:59:20,372 WARN [main] (VCFparser.java:739): Returned null variant for line 21 9704331 rs770995567 C . . . RS=770995567;RSPOS=9704332;db SNPBuildID=144;SSR=0;SAO=0;VP=0x050000000005000002000200;WGT=1;VC=DIV;ASP 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 28433115 to 28433116 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 47717396 to 47717397 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:312): N found at 4492798 to 4492799 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 32656322 to 32656323 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 10414698 to 10414699 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 31975147 to 31975148 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 45254136 to 45254137 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 24919733 to 24919734 16 Sep 2021 14:59:20,613 WARN [main] (Variant.java:323): Same ref at alt at 16239803 to 16239804

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bioinform/varsim/issues/260#issuecomment-922021077, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3A7TUCFD43NROWIESKQGDUCOHH5ANCNFSM5DJ57OJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

yunfeiguo avatar Sep 17 '21 19:09 yunfeiguo

Additionally you can use Docker to run VarSim: https://hub.docker.com/repository/docker/rssbred/varsim

yunfeiguo avatar Sep 20 '21 17:09 yunfeiguo