datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

Improve the insertion and fetching to/from S3 external storage by batch file transfer

Open shenshan opened this issue 4 years ago • 0 comments

In IBL pipeline, I was trying to replace a table AlignedSpikeTimes to make its longblob field to be a S3 external fields. I noticed major performance differences in both insert and fetching. I have also tried the local storage. Here are the statistics (for the first 3 entries of session 'f8d5c8b0-b931-4151-b86c-c471e2e80e5d'):

Insertion of 787 entries at the same time.

  • Internal 0.4s; external s3 24s; external local 1.5s.

Fetching of 2033 entries at the same time.

  • Internal 0.8s; external 28s; external local 1.1s.

The major reason for the performance difference is that s3 does not allow parallel transfer of files. Therefore, we would like to implement a mechanism for parallel transfer into or out of S3.

shenshan avatar Aug 17 '20 23:08 shenshan