ckanext-archiver
ckanext-archiver copied to clipboard
Large File leak in tasks._save_resource
Here: https://github.com/ckan/ckanext-archiver/blob/master/ckanext/archiver/tasks.py#L734
def _save_resource(resource, response, max_file_size, chunk_size=1024*16):
"""
Write the response content to disk.
Returns a tuple:
(file length: int, content hash: string, saved file path: string)
"""
resource_hash = hashlib.sha1()
length = 0
fd, tmp_resource_file_path = tempfile.mkstemp()
with open(tmp_resource_file_path, 'wb') as fp:
for chunk in response.iter_content(chunk_size=chunk_size,
decode_unicode=False):
fp.write(chunk)
length += len(chunk)
resource_hash.update(chunk)
if length >= max_file_size:
raise ChooseNotToDownload(
_("Content-length %s exceeds maximum allowed value %s") %
(length, max_file_size))
os.close(fd)
content_hash = unicode(resource_hash.hexdigest())
return length, content_hash, tmp_resource_file_path
If the file is too large, it raises an error but there is not enough information in the exception to clean up the file.
Unfortunately, this means that "too large" resources will accumulate in the /tmp directory over time.