exist
exist copied to clipboard
[hotfix] use DiskFileItemFactory default threshhold
This could improve performance when uploading a large number of smaller files by reducing IO traffic.
Even after running instances with this patch applied we could see no negative effects. And less code is always less code to maintain.
The below assumption was not reproducible in our testing. ~~Setting the SizeThreshhold to a low value can lead to Unexpected EOF errors when decoding multipart messages.~~
@line-o I am afraid we need some more context here. What is the issue and how is it reproduced? Also we may need some tests, but I am not certain as I cannot tell what the issue is yet..
it reduces the file i/o for smaller files that are uploaded; normally the buffer is 32k or so;
From https://commons.apache.org/proper/commons-fileupload/apidocs/org/apache/commons/fileupload/disk/DiskFileItemFactory.html#setSizeThreshold-int-
I understand that the default threshold is 10KB. The meaning of the threshold (according to the Javadoc) is that files smaller than the threshold are kept in memory, and files larger than the threshold are written to a temporary file on disk.
I would think that a value of 0 would mean that ALL files are written to a temporary file on disk.
Unexpected EOF errors when decoding multipart messages.
If any value set to the threshold is causing errors, then by my understanding this is clearly a bug and changing the threshold will only mitigate the issue. Can you share your test cases please?
I am in the middle of creating reproducible test cases / testing it.
It could be this was added due a bug in the library.....
I deep-dived in the history. Initially I thought that this threshold was set to 0 because of several issues we faced in the years; but I am wrong
the code was added by @wolfgangmm in 2004, where multi-part form data parsing was added. For now I see no good reason to keep this line, as little in-memory caching will help reducing overhead for IO.
@dizzzz Yes once we have the test cases from @line-o (which he said he is working on)
There is no test case to add. This PR, unfortunately, does not affect the IOexceptions we observe when uploading large file over HTTPS.
@line-o Okay that's unfortunate. Do you want to close this now then?
No, as @dizzzz already pointed out this will improve performance for small files.
This could improve performance when uploading a large number of smaller files by reducing IO traffic.
My concern is that this change has a negative impact also which needs to be considered. Many parallel uploads will consume memory which is an expensive and finite resource when compared to disk. IMHO it is better that these things are buffered in disk than in memory, and I doubt you will see a noticeable performance difference due to paged disk cache etc; Not to mention that when considered overall, network will be far slower than either disk or memory.
Memory is cheap. Where do you expect parallel uploads to impact memory significantly? one megabyte of memory can hold 1024 files in memory (assuming the 10K default limit you mentioned). That is a small amount of RAM and a huge amount of parallel uploads.
I have no preference; I think this zero-tuning is really not needed, and should not be there. But I hear adam's argument...
This is superseded by https://github.com/eXist-db/exist/pull/4575 and can likely be closed.
still makes sense for v6 ; not required for v7 and beyond