createOriginalFileFromFileObj reads entire file into memory for SHA1 hash
Problem
The method _BlitzGateway.createOriginalFileFromFileObj in src/omero/gateway/__init__.py line 4075 reads the entire file into memory to compute the SHA1 hash:
h.update(fo.read()) # Reads entire file into memory
This causes out-of-memory errors for files larger than available RAM.
Suggested Fix
Use chunked reading for SHA1 computation, similar to the upload logic that already uses 10KB chunks (lines 4093-4101):
h = sha1()
chunk_size = 10000
fo.seek(0)
while True:
chunk = fo.read(chunk_size)
if not chunk:
break
h.update(chunk)
Impact
Files larger than RAM cannot be uploaded through this method.
Thanks for opening the issue and agreed on the limitations of the current implementation. Feel free to work on a Pull Request implementing your proposal alongside a completed CLA.
On a technical note,the code is already reading the source file in chunks as part of the upload via the raw file store
https://github.com/ome/omero-py/blob/780876c5f6b48c327ef1685b377bac2c3cc46797/src/omero/gateway/init.py#L4086-L4104
It might make sense to perform the SHA computation/update as part of the same loop to reduce the number of I/O operations especially for large files and update the OriginalFile post upload completion with the checksum.