Pisces icon indicating copy to clipboard operation
Pisces copied to clipboard

CreateGenomeSizeFile_5.2.9.122.tar.gz

Open saty89 opened this issue 6 years ago • 22 comments

Hi, I am trying to create the GenomeSize.xml file but this version is just empty. And when I tried with CreateGenomeSizeFile_5.2.7.47.tar.gz it creates VennVcf_5.2.7.47.

Any other options I can use until this is fixed?

Thanks, Satwica

saty89 avatar Nov 13 '18 01:11 saty89

I create the GenomeSize.xml by CreateGenomeSizeFile_5.2.7.47 and I works well. And the command line is:

dotnet CreateGenomeSizeFile.dll –g /storage/sta/Reference_Genome/hg19/ucsc.hg19.fasta –s Human (UCSC rn1) –o /storage/sta/Reference_Genome/hg19/

HeXY0515 avatar Nov 19 '18 06:11 HeXY0515

Hello @HeXY0515 Could you share your version of the file CreateGenomeSizeFile.dll?

przemekl avatar Dec 12 '18 09:12 przemekl

CreateGenomeSizeFile_5.2.9.122.tar.gz

CreateGenomeSizeFile_5.2.7.47.tar.gz

sorry. These both work for me. The 5.2.9 associated with the release looks like it has issues. I'll try to fix.

tamsen avatar Dec 17 '18 18:12 tamsen

OK, I think its all working now. If you find another broken tar, or its still not working for you, please let me know. Thanks everyone for spotting this.

tamsen avatar Dec 17 '18 18:12 tamsen

Thank you @tamsen !!!!

astewart-twist avatar Dec 19 '18 01:12 astewart-twist

CreateGenomeSizeFile

can not creat a XML file haha

hmyh1202 avatar Apr 12 '19 02:04 hmyh1202

CreateGenomeSizeFile_5.2.9.122.tar.gz

CreateGenomeSizeFile_5.2.7.47.tar.gz

sorry. These both work for me. The 5.2.9 associated with the release looks like it has issues. I'll try to fix.

How to used? I have tried many paragram, all not work . shame

hmyh1202 avatar Apr 12 '19 03:04 hmyh1202

Hi Tamsen! Can you please suggest a fix?

I can't get CreateGenomeSizeFile to work. I'm using hg19 fasta file downloaded from UCSC. When I entered: CreateGenomeSizeFile -g Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s "Homo sapiens (UCSC hg19)" -o Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA I get: "Please specify the full genome name ("Genus Species (Source Build)" - e.g. "Rattus norvegicus (UCSC rn4)"; include the strain name if available, e.g. "Bacillus cereus ATCC 10987 (NCBI 2004-02-13)"). Some problems were encountered when parsing the command line options:"

What am I doing wrong?

sprakashUTH avatar Apr 21 '20 15:04 sprakashUTH

Sorry, its been a while since I looked at this code!

for hmyh1202 - do you have write permissions?

for spraka: What version of you using? there are corrected dlls in this thread..

You could also try with or without quotes, try single quotes, or renaming Homo sapiens to Homo_sapiens. Maybe something is getting mangled with the input string in the command line.

If nothing simple works, I can try to reproduce your issue and take a look in the debugger.

tamsen avatar Apr 22 '20 20:04 tamsen

5.2.9.122. I tried without quotes, with quotes, and renaming Homo sapiens to Homo_sapiens. I assume the hg19 part is correct. Nothing works.

sprakashUTH avatar Apr 22 '20 20:04 sprakashUTH

I'm sorry. That must be very frustrating. I don't have my compiler with me right now, but maybe you found a real bug? I can try to make some time for it in the next few days.

Do other pisces commands normally work for you? Windows sometimes has a problem between "-" and "–" . Also, historically we have only been testing for windows and linux. Macs are not supported yet. What OS are you using?

Some other ideas while you wait:

You could try simplifying (UCSC hg19) to (hg19)

You can try this: https://github.com/Illumina/Pisces/files/2687406/CreateGenomeSizeFile_5.2.7.47.tar.gz or other prior versions. Because people have def been using it in the past without issue (see this thread for other users command lines) .

best Tamsen

tamsen avatar Apr 22 '20 20:04 tamsen

Linux version: login2.ls5(1020)$ cat /etc/*-release NAME="SLES" VERSION="12-SP3" VERSION_ID="12.3" PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3" ID="sles" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:suse:sles:12:sp3" SUSE Linux Enterprise Server 12 (x86_64) VERSION = 12 PATCHLEVEL = 3

I tried hg19 alone. I tried the version that you mentioned. No luck so far.

sprakashUTH avatar Apr 24 '20 14:04 sprakashUTH

Hi there,

I am sorry again for being late in getting back to you... its been busy.

I reproduced your command and (of course) it all worked for me. I used a fresh 5.2.9 binary I pulled down from the github releases page for your version number, and I spoofed the genome data by copying down some bacterial data so it ran quick (you can get the same data I used from https://github.com/Illumina/Pisces/tree/master/src/test/SharedData/Genomes/Bacillus_cereus/Sequence/WholeGenomeFasta , and put the .fa, .fai, and dict files in your Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA folder for a quick test)

tamsen@tamsen-Inspiron-3847:~/PiscesBinaries/CreateGenomeSizeFile_5.2.9.122$ dotnet CreateGenomeSizeFile.dll -g ~/Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s "Homo sapiens (UCSC hg19)" -o ~/Genomes/Output '--------------------------------------------------------------------------- CreateGenomeSizeFile Copyright (c) Illumina 2018 https://github.com/Illumina/Pisces 5.2.9.122 '---------------------------------------------------------------------------

5/22/20 10:08 AM 1 ************* Starting ************** 5/22/20 10:08 AM 1 Version: 5.2.9.122. 5/22/20 10:08 AM 1 Command-line arguments: . 5/22/20 10:08 AM 1 "-g /home/tamsen/Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s Homo sapiens (UCSC hg19) -o /home/tamsen/Genomes/Output". Preparing GenomeSize.xml for folder /home/tamsen/Genomes/Output... GenomeSize.xml prepared at /home/tamsen/Genomes/Output/GenomeSize.xml 5/22/20 10:08 AM 1 ******************** Ending *********************

My system is NAME="Ubuntu" VERSION="18.04.4 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.4 LTS" VERSION_ID="18.04" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

  1. Can you try exactly with my test data and see if that works? Just to narrow things down.
  2. There is also a log folder CreateGenomeSizeFileLogs that should be written to your output folder. Does the logs give any clues?
  3. Have you tried the latest 5.2.10 dll?

best Tamsen

tamsen avatar May 22 '20 17:05 tamsen

Hi Tamsen,

I did exactly as you suggested and the output is the same:

login2.ls5(1083)$ CreateGenomeSizeFile -g Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s "Homo_sapiens (UCSC hg19)" -o Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA

Please specify the full genome name ("Genus Species (Source Build)" - e.g. "Rattus norvegicus (UCSC rn4)"; include the strain name if available, e.g. "Bacillus cereus ATCC 10987 (NCBI 2004-02-13)").

Some problems were encountered when parsing the command line options:

For a complete list of command line options, type "dotnet CreateGenomeSizeFile.dll -h"

login2.ls5(1084)$ cd Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA

login2.ls5(1085)$ ls

genome.dict genome.fa genome.fa.fai

It may be that the dll file is incompatible with our environment somehow. Should we try 5.2.10?

From: tamsen [email protected] Reply-To: Illumina/Pisces [email protected] Date: Friday, May 22, 2020 at 12:29 PM To: Illumina/Pisces [email protected] Cc: "Prakash, Siddharth K" [email protected], Comment [email protected] Subject: Re: [Illumina/Pisces] CreateGenomeSizeFile_5.2.9.122.tar.gz (#23)

**** EXTERNAL EMAIL ****

Hi there,

I am sorry again for being late in getting back to you... its been busy.

I reproduced your command and (of course) it all worked for me. I used a fresh 5.2.9 binary I pulled down from the github releases page for your version number, and I spoofed the genome data by copying down some bacterial data so it ran quick (you can get the same data I used from https://github.com/Illumina/Pisces/tree/master/src/test/SharedData/Genomes/Bacillus_cereus/Sequence/WholeGenomeFastahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces_tree_master_src_test_SharedData_Genomes_Bacillus-5Fcereus_Sequence_WholeGenomeFasta&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=ekrxzd6fSVlKJMtMHeNX_M-OKLXlipYNSPaeOVAZJ78&s=Xjjod1FwHdl_6yM3VPsFOr294c73d7aI6WM7HpatoBA&e= , and put the .fa, .fai, and dict files in your Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA folder for a quick test)

tamsen@tamsen-Inspiron-3847:~/PiscesBinaries/CreateGenomeSizeFile_5.2.9.122$ dotnet CreateGenomeSizeFile.dll -g ~/Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s "Homo sapiens (UCSC hg19)" -o ~/Genomes/Output CreateGenomeSizeFile Copyright (c) Illumina 2018 https://github.com/Illumina/Pisceshttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=ekrxzd6fSVlKJMtMHeNX_M-OKLXlipYNSPaeOVAZJ78&s=ziZoZyQGK1ooW4W6mR4xV61nVSWYXG-f0ZZX3c81oX8&e= 5.2.9.122

5/22/20 10:08 AM 1 ************* Starting ************** 5/22/20 10:08 AM 1 Version: 5.2.9.122. 5/22/20 10:08 AM 1 Command-line arguments: . 5/22/20 10:08 AM 1 "-g /home/tamsen/Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s Homo sapiens (UCSC hg19) -o /home/tamsen/Genomes/Output". Preparing GenomeSize.xml for folder /home/tamsen/Genomes/Output... GenomeSize.xml prepared at /home/tamsen/Genomes/Output/GenomeSize.xml 5/22/20 10:08 AM 1 ******************** Ending *********************

My system is NAME="Ubuntu" VERSION="18.04.4 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.4 LTS" VERSION_ID="18.04" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

  1. Can you try exactly with my test data and see if that works? Just to narrow things down.
  2. There is also a log folder CreateGenomeSizeFileLogs that should be written to your output folder. Does the logs give any clues?
  3. Have you tried the latest 5.2.10 dll?

best Tamsen

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces_issues_23-23issuecomment-2D632819616&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=ekrxzd6fSVlKJMtMHeNX_M-OKLXlipYNSPaeOVAZJ78&s=sNq6R1AYZQ7t2hk88gDuWr_I109OE7_gUwyIXw9FC9s&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_API7HLRUOQFL47F7KXOXHFTRS2Y7PANCNFSM4GDK7EPQ&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=ekrxzd6fSVlKJMtMHeNX_M-OKLXlipYNSPaeOVAZJ78&s=AGafn5zSqETFpUeshMhQ34PSyRN3p_F0UzP5DUBBMsg&e=.

sprakashUTH avatar May 30 '20 16:05 sprakashUTH

Hi,

Do you normally omit the "dotnet" ? Do other Pisces programs work for you? Normally calling "dotnet" is the first part of the command structure.

Yes, go ahead and try the other versions. This is very strange.

tamsen avatar May 30 '20 18:05 tamsen

We installed pisces in a container. That may be why we don’t call ‘dotnet’, but I am insufficiently familiar with the technical aspects. I copied our excellent research associate and our system guru, Joe Allen, who may be able to explain.

From: tamsen [email protected] Reply-To: Illumina/Pisces [email protected] Date: Saturday, May 30, 2020 at 1:12 PM To: Illumina/Pisces [email protected] Cc: "Prakash, Siddharth K" [email protected], Comment [email protected] Subject: Re: [Illumina/Pisces] CreateGenomeSizeFile_5.2.9.122.tar.gz (#23)

**** EXTERNAL EMAIL ****

Hi,

Do you normally omit the "dotnet" ? Do other Pisces programs work for you? Normally calling "dontnet" the first part of the command structure.

Yes, go ahead and try the other versions. This is very strange.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces_issues_23-23issuecomment-2D636365668&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=sXhbJZKjf5udXTMNOcPQK5RGP6qSwjXF6elWh66Zfgg&s=NKPUXqmCN9erJVX0_nDXtmNhlwJ6Hx5ak3vG1GciTn4&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_API7HLSHICXUXOHVKDEBYBLRUFEA3ANCNFSM4GDK7EPQ&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=sXhbJZKjf5udXTMNOcPQK5RGP6qSwjXF6elWh66Zfgg&s=Bqcka1nqWCCtP-5ot7lDJX-FWrhWRaPp8Lai3wFZAJk&e=.

sprakashUTH avatar May 30 '20 20:05 sprakashUTH

Hello,

To answer the ‘dotnet’ question, yes ‘dotnet’ is hidden in there. We support Pisces as a container. Outside of the container, ‘CreateGenomeSizeFile’ is an alias that evaluates to:

“singularity exec ${TACC_PISCES_DIR}/pisces_5.2.7.47.sif dotnet /app/CreateGenomeSizeFile_5.2.7.47/CreateGenomeSizeFile.dll $@"

For example, this is what it looks like when we load a pisces module on our cluster:

$ module load pisces/5.2.7.47 $ CreateGenomeSizeFile --help

CreateGenomeSizeFile Copyright (c) Illumina 2018 https://github.com/Illumina/Pisces 5.2.7.47

USAGE: dotnet CreateGenomeSizeFile.dll -s -g -out CreateGenomeSizeFile: create a genome size xml file from a fasta file.

REQUIRED: -g <FOLDER> FOLDER Genome folder. Example folder structure: \Genomes\Homo_sapiens\UCSC\hg19\Sequence\WholeG- enomeFASTA -s <STRING> STRING Species and build, in quotes. Example format: Genus Species (Source Build). - e.g. "Rattus norvegicus (UCSC rn4)"

COMMON: -o, --out, --outfolder <FOLDER> FOLDER output directory --help, -h displays the help menu --version, -v displays the version

5.2.7.47

Thanks,

Joe

From: Siddharth Prakash [email protected] Date: Saturday, May 30, 2020 at 3:05 PM To: Illumina/Pisces [email protected] Cc: William J Allen [email protected] Subject: Re: [Illumina/Pisces] CreateGenomeSizeFile_5.2.9.122.tar.gz (#23)

We installed pisces in a container. That may be why we don’t call ‘dotnet’, but I am insufficiently familiar with the technical aspects. I copied our excellent research associate and our system guru, Joe Allen, who may be able to explain.

From: tamsen [email protected] Reply-To: Illumina/Pisces [email protected] Date: Saturday, May 30, 2020 at 1:12 PM To: Illumina/Pisces [email protected] Cc: "Prakash, Siddharth K" [email protected], Comment [email protected] Subject: Re: [Illumina/Pisces] CreateGenomeSizeFile_5.2.9.122.tar.gz (#23)

**** EXTERNAL EMAIL ****

Hi,

Do you normally omit the "dotnet" ? Do other Pisces programs work for you? Normally calling "dontnet" the first part of the command structure.

Yes, go ahead and try the other versions. This is very strange.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces_issues_23-23issuecomment-2D636365668&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=sXhbJZKjf5udXTMNOcPQK5RGP6qSwjXF6elWh66Zfgg&s=NKPUXqmCN9erJVX0_nDXtmNhlwJ6Hx5ak3vG1GciTn4&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_API7HLSHICXUXOHVKDEBYBLRUFEA3ANCNFSM4GDK7EPQ&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=sXhbJZKjf5udXTMNOcPQK5RGP6qSwjXF6elWh66Zfgg&s=Bqcka1nqWCCtP-5ot7lDJX-FWrhWRaPp8Lai3wFZAJk&e=.

This message is from an external sender. Learn more about why this matters.https://ut.service-now.com/sp?id=kb_article&number=KB0011401

sprakashUTH avatar Jun 01 '20 16:06 sprakashUTH

Hi,

Hm, so how about we take your unusual configuration out of the equation for a moment? Can you please install dotnet and CreateGenomeSizeFile 5.2.9.122 natively on a linux box or pc, run my little small-genome test, and see if you still have the issue? Also, are you sure Joe has 5.2.9 set up? His email quoted 5.2.7...? The version you are using should be obvious from the text in your log.

(note - no response from the user after this email, so presume it was a configuration issue)

tamsen avatar Jun 01 '20 18:06 tamsen

Hi tamsen,

I am trying to use Pisces SNV calling on my customized construct. So basically the reference sequence will be a vector sequence + my gene of interest. I wonder if I can use CreateGenomeSize in this scenario. Does it matter what I write in -s ? I tried a random string and it doesn't work... Thank you!

ltongyu avatar Jun 16 '21 15:06 ltongyu

Hi! It doesn't matter. Just have the format match the recommendation in the help. ie, "imaginary species (build2014)" would work.

tamsen avatar Jun 16 '21 17:06 tamsen

Hi Tamsen,

Thank you for your quick response. It worked and went through! I have another silly question that you may be able to advise me. With the Pisces vcf output file, if I want to directly extract the VF in each position, do you know any program to process this?

Thank you! Tongyu

On Wed, Jun 16, 2021 at 1:05 PM tamsen @.***> wrote:

Hi! It doesn't matter. Just have the format match the recommendation in the help. ie, "imaginary species (build2014)" would work.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Illumina/Pisces/issues/23#issuecomment-862554997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLCMJYQX6WGW3NPSXHYKFLTTDKW3ANCNFSM4GDK7EPQ .

-- Tongyu Liu Jiandie Lin Laboratory PhD candidate in Cell & Developmental Biology Master student in Bioinformatics Life Sciences Institute, University of Michigan, Ann Arbor, MI Email: @.***

ltongyu avatar Jun 16 '21 20:06 ltongyu

I dont know a program. I'd probably just script it.

GT:GQ:AD:DP:VF:NL:SB 0/1:100:6978,4274:11252:0.380:20:0.0000

So in your vcf, you see data like the above. In this case, the "0.380" is your variant freq. You can use what ever parser you want to access this.. Here's some example python code to parse it into a dictionary. Note, the datatypes you get back will be strings.

def GetDictFromSampleString(formatstring,samplestring): formatSplat=formatstring.split(":") sampleSplat=samplestring.split(":") result = dict(zip(formatSplat,sampleSplat)) return result

then you could do something like myData=GetDictFromSampleString("GT:GQ:AD:DP:VF:NL:SB","0/1:100:6978,4274:11252:0.380:20:0.0000") result_you_are_looking_for=myData["VF"]

tamsen avatar Jun 17 '21 23:06 tamsen