ckanext-archiver icon indicating copy to clipboard operation
ckanext-archiver copied to clipboard

Large File leak in tasks._save_resource

Open EricSoroos opened this issue 5 years ago • 0 comments

Here: https://github.com/ckan/ckanext-archiver/blob/master/ckanext/archiver/tasks.py#L734

def _save_resource(resource, response, max_file_size, chunk_size=1024*16):
    """
    Write the response content to disk.
    Returns a tuple:
        (file length: int, content hash: string, saved file path: string)
    """
    resource_hash = hashlib.sha1()
    length = 0

    fd, tmp_resource_file_path = tempfile.mkstemp()

    with open(tmp_resource_file_path, 'wb') as fp:
        for chunk in response.iter_content(chunk_size=chunk_size,
                                           decode_unicode=False):
            fp.write(chunk)
            length += len(chunk)
            resource_hash.update(chunk)

            if length >= max_file_size:
                raise ChooseNotToDownload(
                    _("Content-length %s exceeds maximum allowed value %s") %
                    (length, max_file_size))

    os.close(fd)

    content_hash = unicode(resource_hash.hexdigest())
    return length, content_hash, tmp_resource_file_path

If the file is too large, it raises an error but there is not enough information in the exception to clean up the file.

Unfortunately, this means that "too large" resources will accumulate in the /tmp directory over time.

EricSoroos avatar Mar 20 '19 11:03 EricSoroos