arcgis-python-api icon indicating copy to clipboard operation
arcgis-python-api copied to clipboard

Large file uploads to AGOL crash

Open spiskulaos opened this issue 2 years ago • 0 comments
trafficstars

Describe the bug When using the python module to upload a 100GB Tile Package to AGOL (using a Windows machine with 8GB RAM) after a while the script crashes with MemoryError

To Reproduce Run the script

            item_properties = {'title': os.path.splitext(os.path.basename(file))[0]}
            if file.lower().endswith("tpkx"):
                item_properties['type'] = "Compact Tile Package"

            _log.info("##[section] Starting upload of file {0}".format(file))
            start = time.time()
            file_item = gis.content.add(item_properties=item_properties, data=file)

error:

Traceback (most recent call last):
  File "C:\Temp\agol_file_upload.py", line 88, in main
    file_item = gis.content.add(item_properties=item_properties, data=file)
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\gis\__init__.py", line 5975, in add
    status = self._add_by_part(
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\gis\__init__.py", line 5589, in _add_by_part
    futures = {
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\gis\__init__.py", line 5589, in <dictcomp>
    futures = {
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\gis\__init__.py", line 5550, in chunk_by_file_size
    data = bio.write(reader.read(size))
MemoryError

Expected behavior The file should upload correctly

Platform (please complete the following information):

  • OS: Windows 2019
  • Python API Version 2.1.0.2

Additional context Previous Python API version, something like 1.8? to upload files it used to create temporary file chunks on disk. It already had issues with not clearing them and causing disk getting too full if the disk with Temp folder was too small while uploaded file was too big. In this scenario it looks like the (large) file is read too rapidly to memory in one go. It looks like a better use of ThreadPoolExecutor could help. Uploaded bytes should be released from memory immediately. Maybe some form of lazy loading of file parts with something like imap() could help refactoring the approach ?

spiskulaos avatar May 16 '23 07:05 spiskulaos