brainlit
brainlit copied to clipboard
Write pipeline and documentation for EC2 interaction with s3
Downloading multiple chunks of data from s3 takes a long time, and is often the slowest step in a pipeline by far. For example,
from brainlit.utils.session import NeuroglancerSession
sess = NeuroglancerSession("s3://...")
data, _, _ = sess.pull_chunk(...) # downloads data from s3, takes time (minutes-hours depending on size)
# run algs on data
One possible way to get around this is by creating an EC2 instance linked to s3, to directly read the data link.
First, we should create a small notebook demonstrating this method is actually faster.