drop icon indicating copy to clipboard operation
drop copied to clipboard

Running pipeline offline in trusted research environemnt

Open chrisodhams opened this issue 11 months ago • 1 comments

Hi,

Following up on a previous issue: 319

I am attempting to run the pipeline in a secure research environment where outbound internet access is prohibited for patient confidentiality. "The RE is a secure and controlled environment. This means that there is no internet access from the RE and data contained within the RE cannot be exported in its raw form. This policy is to protect the privacy our participants who have generously donated their genomes and clinical history for research. It is your responsibility to comply with the terms of use."

This has become common place to work in airlocked environments.

The pipeline seems to require outbound internet access at certain stages, for example:

AberrantSplicing_pipeline_Counting_01_1_countRNA_splitReads_samplewise_R
Load packages
Loading required package: rtracklayer
Thu Feb 29 16:30:09 2024: Count split reads for sample: HG00096
Error in download.file(url, destfile, quiet = TRUE) : 
  cannot open URL 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz'
Calls: countSplitReads ... fetch_table_from_UCSC_database -> fetch_table_from_url
Execution halted

On this occasion it is simply downloading a static file 'chromInfo.txt.gz'. If this is the case, can this file (and for other assemblies too) just be part of the standard resources and the code point to the file?

Is there anywhere else where outbound internet is required? If so this is a unfortunately a blocker at Genomics England.

Many thanks,

Chris

chrisodhams avatar Feb 29 '24 18:02 chrisodhams

Hi Chris, yes, we're aware of that problem. It should only happen if the data is unstranded though. That is the only step of the whole pipeline that requires a connection to the internet. We're trying to come up with a solution.

vyepez88 avatar Apr 08 '24 12:04 vyepez88