arcgis-python-api
arcgis-python-api copied to clipboard
Large file uploads to AGOL crash
Describe the bug
When using the python module to upload a 100GB Tile Package to AGOL (using a Windows machine with 8GB RAM) after a while the script crashes with MemoryError
To Reproduce Run the script
item_properties = {'title': os.path.splitext(os.path.basename(file))[0]}
if file.lower().endswith("tpkx"):
item_properties['type'] = "Compact Tile Package"
_log.info("##[section] Starting upload of file {0}".format(file))
start = time.time()
file_item = gis.content.add(item_properties=item_properties, data=file)
error:
Traceback (most recent call last):
File "C:\Temp\agol_file_upload.py", line 88, in main
file_item = gis.content.add(item_properties=item_properties, data=file)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\gis\__init__.py", line 5975, in add
status = self._add_by_part(
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\gis\__init__.py", line 5589, in _add_by_part
futures = {
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\gis\__init__.py", line 5589, in <dictcomp>
futures = {
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\gis\__init__.py", line 5550, in chunk_by_file_size
data = bio.write(reader.read(size))
MemoryError
Expected behavior The file should upload correctly
Platform (please complete the following information):
- OS: Windows 2019
- Python API Version 2.1.0.2
Additional context
Previous Python API version, something like 1.8? to upload files it used to create temporary file chunks on disk. It already had issues with not clearing them and causing disk getting too full if the disk with Temp folder was too small while uploaded file was too big. In this scenario it looks like the (large) file is read too rapidly to memory in one go. It looks like a better use of ThreadPoolExecutor could help. Uploaded bytes should be released from memory immediately. Maybe some form of lazy loading of file parts with something like imap() could help refactoring the approach ?