drop
drop copied to clipboard
Error in AberrantSplicing_pipeline_FRASER_08_extract_results_FraseR_R
Hello,
I encountered the following error in the AS module when running DROP version 1.3.3 with FRASER 2.0, 68 samples were run together. Could you help take a look, really appreciate any suggestions on how to fix this! Thanks in advance!
[1] "Results per junction extracted"
Tue Aug 1 08:15:41 2023: Collecting results for: jaccard (transcriptome-wide)
Tue Aug 1 08:16:16 2023: Process chunk: 1 for: jaccard
Tue Aug 1 08:19:58 2023: Process chunk: 2 for: jaccard
Tue Aug 1 08:23:40 2023: Process chunk: 3 for: jaccard
Tue Aug 1 08:27:19 2023: Process chunk: 4 for: jaccard
[1] "Results per gene extracted"
Loading required package: GenomicFeatures
preparing ...
calculating splice types ...
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'start': subscript contains NAs
Calls: annotatePotentialImpact ... normalizeSingleBracketSubscript -> NSBS -> NSBS -> .subscript_error
In addition: Warning message:
In min(distance(test_junction, refseq.genes), na.rm = TRUE) :
no non-missing arguments to min; returning Inf
Execution halted
Hi, can you tell us which gtf file are you using?
Hi,
Thank you so much for the reply, really appreciate the help!
I used GENCODE V39, below is the header from the gtf. It worked in module AE for the same set of samples. For AS, this gtf worked with other datasets in FRASER V1 through older version of DROP, have not tried running it with DROP v1.3.3 and FRASER V1.
##description: evidence-based annotation of the human genome (GRCh38), version 39 (Ensembl 105)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2021-09-02
Wanqing
Hi @wanqingshao , I came across a similar error myself recently, I'm looking into it. I'll come back to you when I know why this happens.
Thank you @ischeller !
Hello @ischeller and @vyepez88,
Wondering did you get a chance to look into this? Or is there any test that you would recommend me trying? Would switching to a different gene model gtf help?
Best, Wanqing
Hi @wanqingshao , sorry for not coming back sooner. While I know that I saw that same error myself some time ago, I have not been able to reproduce it now with DROP 1.3.3. Maybe you could try to do a fresh install of DROP and see if this solves the issue for you? Otherwise, would you have any way to reproduce the behavior you observe? I don't think its related to the gtf, so I would not expect switching that to change. The other thing you could do is to comment out those lines of code in the results script, as they are not strictly needed (they add additional annotation to the results table of whether the outlier introns are in blacklist regions of the genome etc).
Thanks @ischeller for the reply!
I'll try adding --keep-incomplete
in the snakemake run to keep the fraser/fraser/results_gene_all.tsv
output and see if I can do some debugging later, will keep you posted if anything comes out.
Many thanks! Wanqing
Hi @wanqingshao , were you able to have a look at this and sort it out?
Hi @vyepez88,
Unfortunately no. It appears to be errored out in function annotatePotentialImpact
during some steps in FRASER resultAnnotation.R. My input for this step has almost 2 million rows, it took 1-2 days of running time before hitting the problematic line. It seems that the way this function accesses txdb
is restricted to only a single thread. I tried modifying resultAnnotations.R
with some trycatch commands but couldn't really figure it out. I haven't been able to identify the problematic record in the data.
I ended up commenting out the following two lines in script 08_extract_results_FraseR.R as suggested by ischeller
res_junc_dt <- annotatePotentialImpact(result=res_junc_dt, txdb=txdb, fds=fds)
res_genes_dt <- annotatePotentialImpact(result=res_genes_dt, txdb=txdb, fds=fds)
please let me know if you have any suggestions!
-Wanqing
Hi, 2 million rows is a lot. Can you share with us the values of the aberrantSplicing dictionary of your config file?
filter
is set to false, was trying to get everything then apply filter afterwards in downstream analysis.
aberrantSplicing:
run: true
groups:
- fraser
#- fraser_external
recount: true
longRead: false
keepNonStandardChrs: false
filter: false
minExpressionInOneSample: 20
quantileMinExpression: 10
quantileForFiltering: 0.95
minDeltaPsi: 0.05
implementation: PCA
padjCutoff: 1
maxTestedDimensionProportion: 6
genesToTest:
FRASER_version: FRASER2
deltaPsiCutoff: 0.05
The problem might be that you're not restricting your results all, so you're extracting and annotating all junction-sample combinations. Can you please try after changing the following?
padjCutoff: 0.1
deltaPsiCutoff: 0.1
Thank you, I'll give this a try, and to confirm, in addition to changing padj and deltaPsi cutoffs, filter
should also be set to ture
? The documentation recommended true
, but the default value from drop demo is false
Hi @wanqingshao ,
yes, we definitely recommend filter: true
. It is only set false in the demo as the test dataset there is very small to allow fast testing, so filtering is unnecessary there. But on real datasets not filtering makes the computation much slower, as filtering allows to remove many noisy junctions. Typically before filtering the fds objects have millions of rows, while afterwards they contain only hundreds of thousands of rows.
Thank you @ischeller and @vyepez88 ! I'll add the filter as suggested. We plan to run DROP again with a larger dataset (including the samples in this initial test) in the next few weeks, I'll report back if the same error persist.
Many thanks for your help!
-Wanqing
Hello!
I am experiencing the same issue with my current dataset. It consists of 57 samples (target enriched for 49 genes). I have tried various combinations of including only subsets of the samples and found that it seems to (not always but) most often work with smaller sets. I got it to work in smaller sets with each sample, so I can exclude that the reason is one sample being corrupt. Also, the error occurred both with filter true & false.
I am currently testing it on the development branch and got the analysis of other datasets of similar size to run nicely in the past.
Any ideas or suggestions on what to test or what data to provide would be greatly appreciated!
Thank you :) Florentine
recount: false
longRead: false
keepNonStandardChrs: false
filter: true (/false)
minExpressionInOneSample: 20
quantileMinExpression: 10
minDeltaPsi: 0.05
implementation: PCA
padjCutoff: 0.1
maxTestedDimensionProportion: 6
genesToTest: null
FRASER_version: "FRASER2"
deltaPsiCutoff : 0.1
quantileForFiltering: 0.75
Hi Florentine, Can you please give us more details on your error? In which step is it failing? Also, if I understand correctly, you only have 57 samples x 49 genes?
Hello Vicente,
it seems to me to be the exact same error as indicated by the issue opener in the AberrantSplicing_pipeline_FRASER_08_extract_results_FraseR_R step:
-------------------- BAM parameters --------------------
class: ScanBamParam
bamFlag (NA unless specified):
bamSimpleCigar: FALSE
bamReverseComplement: FALSE
bamTag:
bamTagFilter:
bamWhich: 0 ranges
bamWhat:
bamMapqFilter: 0
preparing ...
calculating splice types ...
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'start': subscript contains NAs
Calls: annotatePotentialImpact ... normalizeSingleBracketSubscript -> NSBS -> NSBS -> .subscript_error
In addition: Warning message:
In min(distance(test_junction, refseq.genes), na.rm = TRUE) :
no non-missing arguments to min; returning Inf
Execution halted
and
RuleException:
CalledProcessError in file /tmp/tmpw3oei168, line 157:
Command 'set -euo pipefail; Rscript --vanilla /mnt/output/drop_dev/.snakemake/scripts/tmp59r6ko_d.08_extract_results_FraseR.R' returned non-zero exit status 1.
File "/tmp/tmpw3oei168", line 157, in __rule_AberrantSplicing_pipeline_FRASER_08_extract_results_FraseR_R
File "/root/miniconda3/envs/drop_dev/lib/python3.11/concurrent/futures/thread.py", line 58, in run
Yes, you are understanding correctly, it is 57 x 49.
Thank you very much and kind regards, Flo
Hi Flo, I think the problem is that, probably due to the low number of samples and/or genes, there were no significant results. Can you try setting up the padjcutoff to 1? And maybe also the deltaPsiCutoff to 0.05 or lower? We'll fix DROP to catch that error.
Actually, we had already fixed this in the dev
branch. So also installing the dev branch and rerunning the module should solve your problem
Hello Vicente,
thank you for your reply.
I have been doing the analyses in the dev branch because I encountered this (or another, not sure anymore) issue in the 1.3.3 version, which could be solved using dev.
I found that it seems like one specific sample is causing the error - if I repeat the analysis will all samples except for that specific one the analysis finished successfully. Do you have any idea on how to solve this (or get down to the bottom of the issue)?
Thanks again! Flo
Super weird. Did you try with all the samples but modifying the cut-offs to include all the significant events?
I tried running the analysis with filter: true and filter: false and this did not change the outcome.
Additionally, I tested it with your suggested adaptations (deltaPsiCutoff: 0.01 and padjCutoff: 1) and still get the same error.
(I assume when the filter is set to false the adaptations in the cutoffs would anyway not change anything, right?)
The filter parameter shouldn't have any effect in the cut-offs. Can you further lower the deltaPsiCutoff to 0?
I tried, it didn't change the result unfortunately.
can you try the following?
library(FRASER)
fds <- loadFraserDataSet(file = '{root}/processed_results/aberrant_splicing/datasets/savedObjects/{DROP_GROUP}/fds-object.RDS')
res <- results(fds_mus[1:1000, 1:2], padjCutoff = 1, deltaPsiCutoff = 0)
res
Did you obtain any results?
Yes, I did:
GRanges object with 138 ranges and 16 metadata columns:
seqnames ranges strand | sampleID hgncSymbol
<Rle> <IRanges> <Rle> | <Rle> <Rle>
Intron Jaccard Index chr1 10408141-10411417 + | NK1 <NA>
Intron Jaccard Index chr1 17044889-17053947 + | NK10 <NA>
Intron Jaccard Index chr1 15428010-15428592 + | NK10 <NA>
Intron Jaccard Index chr1 10400573-10404160 + | NK10 <NA>
Intron Jaccard Index chr1 9722357-9722527 + | NK1 <NA>
... ... ... ... . ... ...
Intron Jaccard Index chr1 12202179-12206739 + | NK10 <NA>
Intron Jaccard Index chr1 17024075-17027748 + | NK10 <NA>
Intron Jaccard Index chr1 17024075-17027759 + | NK10 <NA>
Intron Jaccard Index chr1 17050649-17053947 + | NK10 <NA>
Intron Jaccard Index chr1 15426019-15428592 + | NK10 <NA>
type pValue padjust psiValue deltaPsi counts
<Rle> <numeric> <numeric> <Rle> <numeric> <Rle>
Intron Jaccard Index jaccard 0.017551 1 0.2 -0.77 1
Intron Jaccard Index jaccard 0.028675 1 0.07 -0.70 3
Intron Jaccard Index jaccard 0.133430 1 0.02 0.02 1
Intron Jaccard Index jaccard 0.155090 1 0.02 0.02 1
Intron Jaccard Index jaccard 0.167130 1 0.29 -0.44 4
... ... ... ... ... ... ...
Intron Jaccard Index jaccard 1 1 0 -0.01 0
Intron Jaccard Index jaccard 1 1 1 0.01 16
Intron Jaccard Index jaccard 1 1 0 -0.01 0
Intron Jaccard Index jaccard 1 1 0 -0.01 0
Intron Jaccard Index jaccard 1 1 0 0.00 0
totalCounts meanCounts meanTotalCounts nonsplitCounts
<Rle> <Rle> <Rle> <Rle>
Intron Jaccard Index 5 10 12 4
Intron Jaccard Index 41 13 32 38
Intron Jaccard Index 47 0.5 35 3
Intron Jaccard Index 53 0.5 30 0
Intron Jaccard Index 14 5 12.5 10
... ... ... ... ...
Intron Jaccard Index 9 0 9.5 0
Intron Jaccard Index 16 19 19 0
Intron Jaccard Index 16 0 19 0
Intron Jaccard Index 41 0 32 38
Intron Jaccard Index 75 0 56 3
nonsplitProportion nonsplitProportion_99quantile
<Rle> <Rle>
Intron Jaccard Index 0.8 0.79
Intron Jaccard Index 0.93 0.92
Intron Jaccard Index 0.06 0.06
Intron Jaccard Index 0 0
Intron Jaccard Index 0.71 0.71
... ... ...
Intron Jaccard Index 0 0
Intron Jaccard Index 0 0
Intron Jaccard Index 0 0
Intron Jaccard Index 0.93 0.92
Intron Jaccard Index 0.04 0.04
annotatedJunction FDR_set
<Rle> <character>
Intron Jaccard Index none transcriptome-wide
Intron Jaccard Index none transcriptome-wide
Intron Jaccard Index none transcriptome-wide
Intron Jaccard Index none transcriptome-wide
Intron Jaccard Index none transcriptome-wide
... ... ...
Intron Jaccard Index none transcriptome-wide
Intron Jaccard Index none transcriptome-wide
Intron Jaccard Index none transcriptome-wide
Intron Jaccard Index none transcriptome-wide
Intron Jaccard Index none transcriptome-wide
-------
seqinfo: 25 sequences from an unspecified genome
then, can you please follow step by step the 08_extract_results_FraseR script in an R session?
For that, first you will need to load the snakemake object by executing:
snakemake <- readRDS(log_file)
which should look like this: tmp_dir / "AS" / "{dataset}--{annotation}" / "08_results.Rds"
Otherwise, we could set up a zoom.
Yes, I tried it. It crashes in the command
res_junc_dt <- annotatePotentialImpact(result=res_junc_dt, txdb=txdb, fds=fds)
with the error
preparing ...
calculating splice types ...
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'start': subscript contains NAs
In addition: Warning message:
In min(distance(test_junction, refseq.genes), na.rm = TRUE) :
no non-missing arguments to min; returning Inf
Hi Flo, sorry this got lost after the holidays. Was it resolved? If not, can you please check that all is good with the GenomicRanges of res_junc_dt? e.g. all junctions have correct chr and valid start and end positions?