GenomicFeatures
GenomicFeatures copied to clipboard
pmapFromTranscripts strange behaviour
There is some strange behavour on how it handles names in the transcripts: Sorry for bad test data, made this quickly:
tx is a GRangesList of 100.000 transcripts: ranges is 600.000 ORFs on the transcripts as IRanges orfs$index is the index for each orf which transcript it came from
See how the time is different:
- Without names:
grl <- tx
names(grl) <- NULL
system.time(pmapFromTranscripts(x = ranges, transcripts = grl[orfs$index]))
user system elapsed
19.661 1.701 21.355
- With names:
grl <- tx
system.time(pmapFromTranscripts(x = ranges, transcripts = grl[orfs$index]))
user system elapsed
74.474 3.616 78.071
- Without names, and set them afterwards, so result is same as 2.
names(grl) <- NULL
system.time({genomic <- pmapFromTranscripts(x = ranges, transcripts = grl[orfs$index]);
names(genomic) <- names(tx)[orfs$index] })
user system elapsed
19.963 1.634 21.591
So this means that 2. is almost 4 times slower, while we could have done 3 , which is as fast a 1.
Is this intentional ?
sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] GenomicFeatures_1.33.2 GenomicRanges_1.33.13 IRanges_2.15.17
...