brainlit icon indicating copy to clipboard operation
brainlit copied to clipboard

Write pipeline and documentation for EC2 interaction with s3

Open bvarjavand opened this issue 5 years ago • 0 comments

Downloading multiple chunks of data from s3 takes a long time, and is often the slowest step in a pipeline by far. For example,

from brainlit.utils.session import NeuroglancerSession
sess = NeuroglancerSession("s3://...")
data, _, _ = sess.pull_chunk(...)  # downloads data from s3, takes time (minutes-hours depending on size)
# run algs on data

One possible way to get around this is by creating an EC2 instance linked to s3, to directly read the data link.

First, we should create a small notebook demonstrating this method is actually faster.

bvarjavand avatar Aug 14 '20 18:08 bvarjavand