SUPPA
SUPPA copied to clipboard
Problems replicating TRA2 tutorial results
Dear developers, first of all, thanks for this amazing tool!
I'm writing this because I couldn't replicate the TRA2 results from the tutorial.
I cloned SUPPA (1c0ad91
) and built its dependencies with conda (suppa_env.yaml). I didn't use SUPPA version 2.3 available in conda directly because I got this error
ERROR:main:Unknown error: (<class 'UnboundLocalError'>, UnboundLocalError("local variable 'i' referenced before assignment"), <traceback object at 0x179516080>)
. The same didn't happen with the latest (1c0ad91
) git code.
The first problem I encountered was when I did the psiPerEvent calculation, the program runs but I get a lot of errors like this.
ERROR:psiCalculator:transcript ENST00000514649 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000120949;A3:chr1:12195670-12198286:12195670-12198289:+.
ERROR:psiCalculator:transcript ENST00000529606 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000142621;A3:chr1:15695998-15700998:15695998-15701001:+.
ERROR:psiCalculator:transcript ENST00000544435 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000162521;A3:chr1:33116923-33117515:33116923-33117518:+.
ERROR:psiCalculator:transcript ENST00000544435 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000162521;A3:chr1:33138502-33145236:33138502-33145241:+.
ERROR:psiCalculator:transcript ENST00000484445 not found in the "expression file".
ERROR:psiCalculator:PSI not calculated for event ENSG00000187801;A3:chr1:40915847-40916328:40915847-40916337:+
...
Is this normal ? I used the ensemble references, fasta and gtf, from the tutorial.
After this analysis I managed to generate the plot with the generate_boxplot_event.py script but the graph does not look like the one in the tutorial. Did you do any QC filtering on the reads before the analysis ?
The big problem is that I couldn't do the step "Differential splicing with local events", the program runs, prints "done" but does not generate the .dpsi table. It generates only two files: "TRA2_diffSplice.psivec" and "TRA2_diffSplice.dpsi.temp.0"..
My system have a CentOS Linux release 7.9 distribution. I used 20 cores and 100 RAM.
Here are all the command lines i used:
# Download files
parallel-fastq-dump --sra-id SRR1513329 --threads 8 --outdir data/fastq/ --split-files --gzip
parallel-fastq-dump --sra-id SRR1513330 --threads 8 --outdir data/fastq/ --split-files --gzip
parallel-fastq-dump --sra-id SRR1513331 --threads 8 --outdir data/fastq/ --split-files --gzip
parallel-fastq-dump --sra-id SRR1513332 --threads 8 --outdir data/fastq/ --split-files --gzip
parallel-fastq-dump --sra-id SRR1513333 --threads 8 --outdir data/fastq/ --split-files --gzip
parallel-fastq-dump --sra-id SRR1513334 --threads 8 --outdir data/fastq/ --split-files --gzip
# salmon create index:
salmon index -p 20 -t data/ensemble/hg19_EnsenmblGenes_sequence_ensenmbl.fasta -i data/ensemble/index
# suppa extract envents from ensemble
mkdir -p data/ensemble/events_splited && python3 /home/hugo.avila/hugo.avila/repo/SUPPA-2.3/suppa.py generateEvents -i data/ensemble/Homo_sapiens.GRCh37.75.formatted.gtf -o data/ensemble/events_splited/ensemble -e SE SS MX RI FL -f ioe
# suppa merge ensemble events:
bash workflow/scripts/merge_events.sh data/ensemble/events_splited/*.ioe > data/ensemble/ensembl_hg19.events.ioe
# salmon sample quantification:
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513334_1.fastq -2 data/fastq/SRR1513334_2.fastq -p 20 -o results/salmon/SRR1513334
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513329_1.fastq -2 data/fastq/SRR1513329_2.fastq -p 20 -o results/salmon/SRR1513329
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513330_1.fastq -2 data/fastq/SRR1513330_2.fastq -p 20 -o results/salmon/SRR1513330
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513332_1.fastq -2 data/fastq/SRR1513332_2.fastq -p 20 -o results/salmon/SRR1513332
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513331_1.fastq -2 data/fastq/SRR1513331_2.fastq -p 20 -o results/salmon/SRR1513331
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513333_1.fastq -2 data/fastq/SRR1513333_2.fastq -p 20 -o results/salmon/SRR1513333
# salmon merge tables:
######## OBS: I did not always runned in sorted order: SRR1513329, SRR1513330... .
python3 workflow/scripts/multipleFieldSelection.py -i results/salmon/SRR1513330/quant.sf results/salmon/SRR1513332/quant.sf results/salmon/SRR1513331/quant.sf results/salmon/SRR1513333/quant.sf results/salmon/SRR1513334/quant.sf results/salmon/SRR1513329/quant.sf -k 1 -f 4 -o results/salmon/iso_tpm.txt
# salmon format id:
Rscript workflow/scripts/format_Ensembl_ids.R results/salmon/iso_tpm.txt
# suppa get all samples events:
python3 /home/hugo.avila/hugo.avila/repo/SUPPA-2.3/suppa.py psiPerEvent -i data/ensemble/ensembl_hg19.events.ioe -e results/salmon/iso_tpm_formatted.txt -o results/suppa/TRA2_events
# correct input plot:
# This is a simple oneliner to correct the .psi table to be equal as the one of the tutorial (add EventID header and sort columns).
workflow/scripts/sort_samples.sh results/suppa/TRA2_events.psi > results/suppa/TRA2_events_sorted.psi
# create box plot:
mkdir -p results/suppa/boxplot && workflow/scripts/generate_boxplot_event.py -i results/suppa/TRA2_events_sorted.psi -e 'ENSG00000149554;SE:chr11:125496728-125497502:125497725-125499127:+' -g 1-3,4-6 -c NC,KD -o results/suppa/boxplot
# split by condition:
workflow/scripts/split_file.R results/salmon/iso_tpm_formatted.txt SRR1513329,SRR1513330,SRR1513331 SRR1513332,SRR1513333,SRR1513334 results/suppa/split_conditions/TRA2_NC_iso.tpm results/suppa/split_conditions/TRA2_KD_iso.tpm -i
workflow/scripts/split_file.R results/suppa/TRA2_events.psi SRR1513329,SRR1513330,SRR1513331 SRR1513332,SRR1513333,SRR1513334 results/suppa/split_conditions/TRA2_NC_events.psi results/suppa/split_conditions/TRA2_KD_events.psi -e
# diff splicing analysis:
python3 /home/hugo.avila/hugo.avila/repo/SUPPA-2.3/suppa.py diffSplice -m empirical -gc -i data/ensemble/ensembl_hg19.events.ioe -p results/suppa/split_conditions/TRA2_KD_events.psi results/suppa/split_conditions/TRA2_NC_events.psi -e results/suppa/split_conditions/TRA2_KD_events.psi results/suppa/split_conditions/TRA2_NC_events.psi -o results/suppa/split_conditions/TRA2_diffSplice
suppa_env.yaml.txt salmon_env.yaml.txt results_suppa.zip results_salmon.zip
Hi Hugo,
sorry for the delay in the reply.
The error you got could be common if your expression file is truncated or there are transcripts from the event file (.ioe) that do not have any expression.
You still got results, which means that it is not a format issue or a problem with the transcript IDs, I guess.
We do not encounter the error with the diffSplice analysis. Could this be a python version issue?
I hope this helps
E.
On Sat, 23 Apr 2022 at 00:19, Hugo L. Ávila @.***> wrote:
Dear developers, first of all, thanks for this amazing tool!
I'm writing this because I couldn't replicate the TRA2 results from the tutorial.
I cloned SUPPA (1c0ad91) and built its dependencies with conda (suppa_env.yaml). I didn't use SUPPA version 2.3 available in conda directly because I got this error
ERROR:main:Unknown error: (<class 'UnboundLocalError'>, UnboundLocalError("local variable 'i' referenced before assignment"), <traceback object at 0x179516080>)
. The same didn't happen with the latest (1c0ad91) git code.
The first problem I encountered was when I did the psiPerEvent calculation, the program runs but I get a lot of errors like this.
ERROR:psiCalculator:transcript ENST00000514649 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSG00000120949;A3:chr1:12195670-12198286:12195670-12198289:+. ERROR:psiCalculator:transcript ENST00000529606 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSG00000142621;A3:chr1:15695998-15700998:15695998-15701001:+. ERROR:psiCalculator:transcript ENST00000544435 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSG00000162521;A3:chr1:33116923-33117515:33116923-33117518:+. ERROR:psiCalculator:transcript ENST00000544435 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSG00000162521;A3:chr1:33138502-33145236:33138502-33145241:+. ERROR:psiCalculator:transcript ENST00000484445 not found in the "expression file". ERROR:psiCalculator:PSI not calculated for event ENSG00000187801;A3:chr1:40915847-40916328:40915847-40916337:+ ...
Is this normal ? I used the ensemble references, fasta and gtf, from the tutorial.
After this analysis I managed to generate the plot with the generate_boxplot_event.py script but the graph does not look like the one in the tutorial. Did you do any QC filtering on the reads before the analysis ?
[image: boxplot_TRA2] https://user-images.githubusercontent.com/53014804/164727590-747b4c1b-444c-4f91-bd1f-7c9d66b770b5.png
The big problem is that I couldn't do the step "Differential splicing with local events", the program runs, prints "done" but does not generate the .dpsi table. It generates only two files: "TRA2_diffSplice.psivec" and "TRA2_diffSplice.dpsi.temp.0"..
My system have a CentOS Linux release 7.9 distribution. I used 20 cores and 100 RAM.
Here are all the command lines i used:
Download files
parallel-fastq-dump --sra-id SRR1513329 --threads 8 --outdir data/fastq/ --split-files --gzip parallel-fastq-dump --sra-id SRR1513330 --threads 8 --outdir data/fastq/ --split-files --gzip parallel-fastq-dump --sra-id SRR1513331 --threads 8 --outdir data/fastq/ --split-files --gzip parallel-fastq-dump --sra-id SRR1513332 --threads 8 --outdir data/fastq/ --split-files --gzip parallel-fastq-dump --sra-id SRR1513333 --threads 8 --outdir data/fastq/ --split-files --gzip parallel-fastq-dump --sra-id SRR1513334 --threads 8 --outdir data/fastq/ --split-files --gzip
salmon create index:
salmon index -p 20 -t data/ensemble/hg19_EnsenmblGenes_sequence_ensenmbl.fasta -i data/ensemble/index
suppa extract envents from ensemble
mkdir -p data/ensemble/events_splited && python3 /home/hugo.avila/hugo.avila/repo/SUPPA-2.3/suppa.py generateEvents -i data/ensemble/Homo_sapiens.GRCh37.75.formatted.gtf -o data/ensemble/events_splited/ensemble -e SE SS MX RI FL -f ioe
suppa merge ensemble events:
bash workflow/scripts/merge_events.sh data/ensemble/events_splited/*.ioe > data/ensemble/ensembl_hg19.events.ioe
salmon sample quantification:
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513334_1.fastq -2 data/fastq/SRR1513334_2.fastq -p 20 -o results/salmon/SRR1513334
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513329_1.fastq -2 data/fastq/SRR1513329_2.fastq -p 20 -o results/salmon/SRR1513329
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513330_1.fastq -2 data/fastq/SRR1513330_2.fastq -p 20 -o results/salmon/SRR1513330
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513332_1.fastq -2 data/fastq/SRR1513332_2.fastq -p 20 -o results/salmon/SRR1513332
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513331_1.fastq -2 data/fastq/SRR1513331_2.fastq -p 20 -o results/salmon/SRR1513331
salmon quant -i data/ensemble/index -l ISF --gcBias -1 data/fastq/SRR1513333_1.fastq -2 data/fastq/SRR1513333_2.fastq -p 20 -o results/salmon/SRR1513333
salmon merge tables:
python3 workflow/scripts/multipleFieldSelection.py -i results/salmon/SRR1513330/quant.sf results/salmon/SRR1513332/quant.sf results/salmon/SRR1513331/quant.sf results/salmon/SRR1513333/quant.sf results/salmon/SRR1513334/quant.sf results/salmon/SRR1513329/quant.sf -k 1 -f 4 -o results/salmon/iso_tpm.txt
salmon format id:
Rscript workflow/scripts/format_Ensembl_ids.R results/salmon/iso_tpm.txt
suppa get all samples events:
python3 /home/hugo.avila/hugo.avila/repo/SUPPA-2.3/suppa.py psiPerEvent -i data/ensemble/ensembl_hg19.events.ioe -e results/salmon/iso_tpm_formatted.txt -o results/suppa/TRA2_events
correct input plot:# This is a simple oneliner to correct the .psi table to be equal as the one of the tutorial (add EventID header and sort columns).
workflow/scripts/sort_samples.sh results/suppa/TRA2_events.psi > results/suppa/TRA2_events_sorted.psi
create box plot:
mkdir -p results/suppa/boxplot && workflow/scripts/generate_boxplot_event.py -i results/suppa/TRA2_events_sorted.psi -e 'ENSG00000149554;SE:chr11:125496728-125497502:125497725-125499127:+' -g 1-3,4-6 -c NC,KD -o results/suppa/boxplot
split by condition:
workflow/scripts/split_file.R results/salmon/iso_tpm_formatted.txt SRR1513329,SRR1513330,SRR1513331 SRR1513332,SRR1513333,SRR1513334 results/suppa/split_conditions/TRA2_NC_iso.tpm results/suppa/split_conditions/TRA2_KD_iso.tpm -i
workflow/scripts/split_file.R results/suppa/TRA2_events.psi SRR1513329,SRR1513330,SRR1513331 SRR1513332,SRR1513333,SRR1513334 results/suppa/split_conditions/TRA2_NC_events.psi results/suppa/split_conditions/TRA2_KD_events.psi -e
diff splicing analysis:
python3 /home/hugo.avila/hugo.avila/repo/SUPPA-2.3/suppa.py diffSplice -m empirical -gc -i data/ensemble/ensembl_hg19.events.ioe -p results/suppa/split_conditions/TRA2_KD_events.psi results/suppa/split_conditions/TRA2_NC_events.psi -e results/suppa/split_conditions/TRA2_KD_events.psi results/suppa/split_conditions/TRA2_NC_events.psi -o results/suppa/split_conditions/TRA2_diffSplice
suppa_env.yaml.txt https://github.com/comprna/SUPPA/files/8541638/suppa_env.yaml.txt salmon_env.yaml.txt https://github.com/comprna/SUPPA/files/8541639/salmon_env.yaml.txt results_suppa.zip https://github.com/comprna/SUPPA/files/8541705/results_suppa.zip results_samon.zip https://github.com/comprna/SUPPA/files/8541710/results_samon.zip
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/143, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB3APSEWAYHTJDMBQQTVGKYNDANCNFSM5UCO4WSA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ
Tks for the reply @EduEyras !
You still got results, which means that it is not a format issue or a problem with the transcript IDs, I guess.
Could you confirm that the current tutorial support files (Ensemble fasta and gtf) and the command lines are the same used to generate those outputs ?
We do not encounter the error with the diffSplice analysis. Could this be a python version issue ?
Maybe, i will check this out and come back with the answer
Hi,
yes, the wiki is self-contained
The data used is the one provided
best
E.
On Wed, 11 May 2022 at 01:29, Hugo L. Ávila @.***> wrote:
Tks for the reply @EduEyras https://github.com/EduEyras !
You still got results, which means that it is not a format issue or a problem with the transcript IDs, I guess.
Could you confirm that the current tutorial support files (Ensemble fasta and gtf) and the command lines are the same used to generate those outputs ?
We do not encounter the error with the diffSplice analysis. Could this be a python version issue ?
Maybe, i will check this out and come back with the answer
— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/143#issuecomment-1122546024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2ZH5DERFTMHBO2ELLVJJ6D7ANCNFSM5UCO4WSA . You are receiving this because you were mentioned.Message ID: @.***>