datajoint-python
datajoint-python copied to clipboard
Improve the insertion and fetching to/from S3 external storage by batch file transfer
In IBL pipeline, I was trying to replace a table AlignedSpikeTimes to make its longblob field to be a S3 external fields. I noticed major performance differences in both insert and fetching. I have also tried the local storage. Here are the statistics (for the first 3 entries of session 'f8d5c8b0-b931-4151-b86c-c471e2e80e5d'):
Insertion of 787 entries at the same time.
- Internal 0.4s; external s3 24s; external local 1.5s.
Fetching of 2033 entries at the same time.
- Internal 0.8s; external 28s; external local 1.1s.
The major reason for the performance difference is that s3 does not allow parallel transfer of files. Therefore, we would like to implement a mechanism for parallel transfer into or out of S3.