Pisces
Pisces copied to clipboard
CreateGenomeSizeFile_5.2.9.122.tar.gz
Hi, I am trying to create the GenomeSize.xml file but this version is just empty. And when I tried with CreateGenomeSizeFile_5.2.7.47.tar.gz it creates VennVcf_5.2.7.47.
Any other options I can use until this is fixed?
Thanks, Satwica
I create the GenomeSize.xml by CreateGenomeSizeFile_5.2.7.47 and I works well. And the command line is:
dotnet CreateGenomeSizeFile.dll –g /storage/sta/Reference_Genome/hg19/ucsc.hg19.fasta –s Human (UCSC rn1) –o /storage/sta/Reference_Genome/hg19/
Hello @HeXY0515 Could you share your version of the file CreateGenomeSizeFile.dll?
CreateGenomeSizeFile_5.2.9.122.tar.gz
CreateGenomeSizeFile_5.2.7.47.tar.gz
sorry. These both work for me. The 5.2.9 associated with the release looks like it has issues. I'll try to fix.
OK, I think its all working now. If you find another broken tar, or its still not working for you, please let me know. Thanks everyone for spotting this.
Thank you @tamsen !!!!
CreateGenomeSizeFile
can not creat a XML file haha
CreateGenomeSizeFile_5.2.9.122.tar.gz
CreateGenomeSizeFile_5.2.7.47.tar.gz
sorry. These both work for me. The 5.2.9 associated with the release looks like it has issues. I'll try to fix.
How to used? I have tried many paragram, all not work . shame
Hi Tamsen! Can you please suggest a fix?
I can't get CreateGenomeSizeFile to work. I'm using hg19 fasta file downloaded from UCSC. When I entered: CreateGenomeSizeFile -g Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s "Homo sapiens (UCSC hg19)" -o Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA I get: "Please specify the full genome name ("Genus Species (Source Build)" - e.g. "Rattus norvegicus (UCSC rn4)"; include the strain name if available, e.g. "Bacillus cereus ATCC 10987 (NCBI 2004-02-13)"). Some problems were encountered when parsing the command line options:"
What am I doing wrong?
Sorry, its been a while since I looked at this code!
for hmyh1202 - do you have write permissions?
for spraka: What version of you using? there are corrected dlls in this thread..
You could also try with or without quotes, try single quotes, or renaming Homo sapiens to Homo_sapiens. Maybe something is getting mangled with the input string in the command line.
If nothing simple works, I can try to reproduce your issue and take a look in the debugger.
5.2.9.122. I tried without quotes, with quotes, and renaming Homo sapiens to Homo_sapiens. I assume the hg19 part is correct. Nothing works.
I'm sorry. That must be very frustrating. I don't have my compiler with me right now, but maybe you found a real bug? I can try to make some time for it in the next few days.
Do other pisces commands normally work for you? Windows sometimes has a problem between "-" and "–" . Also, historically we have only been testing for windows and linux. Macs are not supported yet. What OS are you using?
Some other ideas while you wait:
You could try simplifying (UCSC hg19) to (hg19)
You can try this: https://github.com/Illumina/Pisces/files/2687406/CreateGenomeSizeFile_5.2.7.47.tar.gz or other prior versions. Because people have def been using it in the past without issue (see this thread for other users command lines) .
best Tamsen
Linux version: login2.ls5(1020)$ cat /etc/*-release NAME="SLES" VERSION="12-SP3" VERSION_ID="12.3" PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3" ID="sles" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:suse:sles:12:sp3" SUSE Linux Enterprise Server 12 (x86_64) VERSION = 12 PATCHLEVEL = 3
I tried hg19 alone. I tried the version that you mentioned. No luck so far.
Hi there,
I am sorry again for being late in getting back to you... its been busy.
I reproduced your command and (of course) it all worked for me. I used a fresh 5.2.9 binary I pulled down from the github releases page for your version number, and I spoofed the genome data by copying down some bacterial data so it ran quick (you can get the same data I used from https://github.com/Illumina/Pisces/tree/master/src/test/SharedData/Genomes/Bacillus_cereus/Sequence/WholeGenomeFasta , and put the .fa, .fai, and dict files in your Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA folder for a quick test)
tamsen@tamsen-Inspiron-3847:~/PiscesBinaries/CreateGenomeSizeFile_5.2.9.122$ dotnet CreateGenomeSizeFile.dll -g ~/Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s "Homo sapiens (UCSC hg19)" -o ~/Genomes/Output '--------------------------------------------------------------------------- CreateGenomeSizeFile Copyright (c) Illumina 2018 https://github.com/Illumina/Pisces 5.2.9.122 '---------------------------------------------------------------------------
5/22/20 10:08 AM 1 ************* Starting ************** 5/22/20 10:08 AM 1 Version: 5.2.9.122. 5/22/20 10:08 AM 1 Command-line arguments: . 5/22/20 10:08 AM 1 "-g /home/tamsen/Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s Homo sapiens (UCSC hg19) -o /home/tamsen/Genomes/Output". Preparing GenomeSize.xml for folder /home/tamsen/Genomes/Output... GenomeSize.xml prepared at /home/tamsen/Genomes/Output/GenomeSize.xml 5/22/20 10:08 AM 1 ******************** Ending *********************
My system is NAME="Ubuntu" VERSION="18.04.4 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.4 LTS" VERSION_ID="18.04" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
- Can you try exactly with my test data and see if that works? Just to narrow things down.
- There is also a log folder CreateGenomeSizeFileLogs that should be written to your output folder. Does the logs give any clues?
- Have you tried the latest 5.2.10 dll?
best Tamsen
Hi Tamsen,
I did exactly as you suggested and the output is the same:
login2.ls5(1083)$ CreateGenomeSizeFile -g Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s "Homo_sapiens (UCSC hg19)" -o Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA
Please specify the full genome name ("Genus Species (Source Build)" - e.g. "Rattus norvegicus (UCSC rn4)"; include the strain name if available, e.g. "Bacillus cereus ATCC 10987 (NCBI 2004-02-13)").
Some problems were encountered when parsing the command line options:
For a complete list of command line options, type "dotnet CreateGenomeSizeFile.dll -h"
login2.ls5(1084)$ cd Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA
login2.ls5(1085)$ ls
genome.dict genome.fa genome.fa.fai
It may be that the dll file is incompatible with our environment somehow. Should we try 5.2.10?
From: tamsen [email protected] Reply-To: Illumina/Pisces [email protected] Date: Friday, May 22, 2020 at 12:29 PM To: Illumina/Pisces [email protected] Cc: "Prakash, Siddharth K" [email protected], Comment [email protected] Subject: Re: [Illumina/Pisces] CreateGenomeSizeFile_5.2.9.122.tar.gz (#23)
**** EXTERNAL EMAIL ****
Hi there,
I am sorry again for being late in getting back to you... its been busy.
I reproduced your command and (of course) it all worked for me. I used a fresh 5.2.9 binary I pulled down from the github releases page for your version number, and I spoofed the genome data by copying down some bacterial data so it ran quick (you can get the same data I used from https://github.com/Illumina/Pisces/tree/master/src/test/SharedData/Genomes/Bacillus_cereus/Sequence/WholeGenomeFastahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces_tree_master_src_test_SharedData_Genomes_Bacillus-5Fcereus_Sequence_WholeGenomeFasta&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=ekrxzd6fSVlKJMtMHeNX_M-OKLXlipYNSPaeOVAZJ78&s=Xjjod1FwHdl_6yM3VPsFOr294c73d7aI6WM7HpatoBA&e= , and put the .fa, .fai, and dict files in your Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA folder for a quick test)
tamsen@tamsen-Inspiron-3847:~/PiscesBinaries/CreateGenomeSizeFile_5.2.9.122$ dotnet CreateGenomeSizeFile.dll -g ~/Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s "Homo sapiens (UCSC hg19)" -o ~/Genomes/Output CreateGenomeSizeFile Copyright (c) Illumina 2018 https://github.com/Illumina/Pisceshttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=ekrxzd6fSVlKJMtMHeNX_M-OKLXlipYNSPaeOVAZJ78&s=ziZoZyQGK1ooW4W6mR4xV61nVSWYXG-f0ZZX3c81oX8&e= 5.2.9.122
5/22/20 10:08 AM 1 ************* Starting ************** 5/22/20 10:08 AM 1 Version: 5.2.9.122. 5/22/20 10:08 AM 1 Command-line arguments: . 5/22/20 10:08 AM 1 "-g /home/tamsen/Genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFASTA -s Homo sapiens (UCSC hg19) -o /home/tamsen/Genomes/Output". Preparing GenomeSize.xml for folder /home/tamsen/Genomes/Output... GenomeSize.xml prepared at /home/tamsen/Genomes/Output/GenomeSize.xml 5/22/20 10:08 AM 1 ******************** Ending *********************
My system is NAME="Ubuntu" VERSION="18.04.4 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.4 LTS" VERSION_ID="18.04" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
- Can you try exactly with my test data and see if that works? Just to narrow things down.
- There is also a log folder CreateGenomeSizeFileLogs that should be written to your output folder. Does the logs give any clues?
- Have you tried the latest 5.2.10 dll?
best Tamsen
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces_issues_23-23issuecomment-2D632819616&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=ekrxzd6fSVlKJMtMHeNX_M-OKLXlipYNSPaeOVAZJ78&s=sNq6R1AYZQ7t2hk88gDuWr_I109OE7_gUwyIXw9FC9s&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_API7HLRUOQFL47F7KXOXHFTRS2Y7PANCNFSM4GDK7EPQ&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=ekrxzd6fSVlKJMtMHeNX_M-OKLXlipYNSPaeOVAZJ78&s=AGafn5zSqETFpUeshMhQ34PSyRN3p_F0UzP5DUBBMsg&e=.
Hi,
Do you normally omit the "dotnet" ? Do other Pisces programs work for you? Normally calling "dotnet" is the first part of the command structure.
Yes, go ahead and try the other versions. This is very strange.
We installed pisces in a container. That may be why we don’t call ‘dotnet’, but I am insufficiently familiar with the technical aspects. I copied our excellent research associate and our system guru, Joe Allen, who may be able to explain.
From: tamsen [email protected] Reply-To: Illumina/Pisces [email protected] Date: Saturday, May 30, 2020 at 1:12 PM To: Illumina/Pisces [email protected] Cc: "Prakash, Siddharth K" [email protected], Comment [email protected] Subject: Re: [Illumina/Pisces] CreateGenomeSizeFile_5.2.9.122.tar.gz (#23)
**** EXTERNAL EMAIL ****
Hi,
Do you normally omit the "dotnet" ? Do other Pisces programs work for you? Normally calling "dontnet" the first part of the command structure.
Yes, go ahead and try the other versions. This is very strange.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces_issues_23-23issuecomment-2D636365668&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=sXhbJZKjf5udXTMNOcPQK5RGP6qSwjXF6elWh66Zfgg&s=NKPUXqmCN9erJVX0_nDXtmNhlwJ6Hx5ak3vG1GciTn4&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_API7HLSHICXUXOHVKDEBYBLRUFEA3ANCNFSM4GDK7EPQ&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=sXhbJZKjf5udXTMNOcPQK5RGP6qSwjXF6elWh66Zfgg&s=Bqcka1nqWCCtP-5ot7lDJX-FWrhWRaPp8Lai3wFZAJk&e=.
Hello,
To answer the ‘dotnet’ question, yes ‘dotnet’ is hidden in there. We support Pisces as a container. Outside of the container, ‘CreateGenomeSizeFile’ is an alias that evaluates to:
“singularity exec ${TACC_PISCES_DIR}/pisces_5.2.7.47.sif dotnet /app/CreateGenomeSizeFile_5.2.7.47/CreateGenomeSizeFile.dll $@"
For example, this is what it looks like when we load a pisces module on our cluster:
$ module load pisces/5.2.7.47 $ CreateGenomeSizeFile --help
CreateGenomeSizeFile Copyright (c) Illumina 2018 https://github.com/Illumina/Pisces 5.2.7.47
USAGE: dotnet CreateGenomeSizeFile.dll -s
REQUIRED: -g <FOLDER> FOLDER Genome folder. Example folder structure: \Genomes\Homo_sapiens\UCSC\hg19\Sequence\WholeG- enomeFASTA -s <STRING> STRING Species and build, in quotes. Example format: Genus Species (Source Build). - e.g. "Rattus norvegicus (UCSC rn4)"
COMMON: -o, --out, --outfolder <FOLDER> FOLDER output directory --help, -h displays the help menu --version, -v displays the version
5.2.7.47
Thanks,
Joe
From: Siddharth Prakash [email protected] Date: Saturday, May 30, 2020 at 3:05 PM To: Illumina/Pisces [email protected] Cc: William J Allen [email protected] Subject: Re: [Illumina/Pisces] CreateGenomeSizeFile_5.2.9.122.tar.gz (#23)
We installed pisces in a container. That may be why we don’t call ‘dotnet’, but I am insufficiently familiar with the technical aspects. I copied our excellent research associate and our system guru, Joe Allen, who may be able to explain.
From: tamsen [email protected] Reply-To: Illumina/Pisces [email protected] Date: Saturday, May 30, 2020 at 1:12 PM To: Illumina/Pisces [email protected] Cc: "Prakash, Siddharth K" [email protected], Comment [email protected] Subject: Re: [Illumina/Pisces] CreateGenomeSizeFile_5.2.9.122.tar.gz (#23)
**** EXTERNAL EMAIL ****
Hi,
Do you normally omit the "dotnet" ? Do other Pisces programs work for you? Normally calling "dontnet" the first part of the command structure.
Yes, go ahead and try the other versions. This is very strange.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Illumina_Pisces_issues_23-23issuecomment-2D636365668&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=sXhbJZKjf5udXTMNOcPQK5RGP6qSwjXF6elWh66Zfgg&s=NKPUXqmCN9erJVX0_nDXtmNhlwJ6Hx5ak3vG1GciTn4&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_API7HLSHICXUXOHVKDEBYBLRUFEA3ANCNFSM4GDK7EPQ&d=DwMCaQ&c=bKRySV-ouEg_AT-w2QWsTdd9X__KYh9Eq2fdmQDVZgw&r=_wR25Q6_6V5aHtne4gUQAZTHcU0BjRWiyj5K1TTKqYU&m=sXhbJZKjf5udXTMNOcPQK5RGP6qSwjXF6elWh66Zfgg&s=Bqcka1nqWCCtP-5ot7lDJX-FWrhWRaPp8Lai3wFZAJk&e=.
This message is from an external sender. Learn more about why this matters.https://ut.service-now.com/sp?id=kb_article&number=KB0011401
Hi,
Hm, so how about we take your unusual configuration out of the equation for a moment? Can you please install dotnet and CreateGenomeSizeFile 5.2.9.122 natively on a linux box or pc, run my little small-genome test, and see if you still have the issue? Also, are you sure Joe has 5.2.9 set up? His email quoted 5.2.7...? The version you are using should be obvious from the text in your log.
(note - no response from the user after this email, so presume it was a configuration issue)
Hi tamsen,
I am trying to use Pisces SNV calling on my customized construct. So basically the reference sequence will be a vector sequence + my gene of interest. I wonder if I can use CreateGenomeSize in this scenario. Does it matter what I write in -s ? I tried a random string and it doesn't work... Thank you!
Hi! It doesn't matter. Just have the format match the recommendation in the help. ie, "imaginary species (build2014)" would work.
Hi Tamsen,
Thank you for your quick response. It worked and went through! I have another silly question that you may be able to advise me. With the Pisces vcf output file, if I want to directly extract the VF in each position, do you know any program to process this?
Thank you! Tongyu
On Wed, Jun 16, 2021 at 1:05 PM tamsen @.***> wrote:
Hi! It doesn't matter. Just have the format match the recommendation in the help. ie, "imaginary species (build2014)" would work.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Illumina/Pisces/issues/23#issuecomment-862554997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLCMJYQX6WGW3NPSXHYKFLTTDKW3ANCNFSM4GDK7EPQ .
-- Tongyu Liu Jiandie Lin Laboratory PhD candidate in Cell & Developmental Biology Master student in Bioinformatics Life Sciences Institute, University of Michigan, Ann Arbor, MI Email: @.***
I dont know a program. I'd probably just script it.
GT:GQ:AD:DP:VF:NL:SB 0/1:100:6978,4274:11252:0.380:20:0.0000
So in your vcf, you see data like the above. In this case, the "0.380" is your variant freq. You can use what ever parser you want to access this.. Here's some example python code to parse it into a dictionary. Note, the datatypes you get back will be strings.
def GetDictFromSampleString(formatstring,samplestring): formatSplat=formatstring.split(":") sampleSplat=samplestring.split(":") result = dict(zip(formatSplat,sampleSplat)) return result
then you could do something like myData=GetDictFromSampleString("GT:GQ:AD:DP:VF:NL:SB","0/1:100:6978,4274:11252:0.380:20:0.0000") result_you_are_looking_for=myData["VF"]