exist icon indicating copy to clipboard operation
exist copied to clipboard

[hotfix] use DiskFileItemFactory default threshhold

Open line-o opened this issue 4 years ago • 14 comments

This could improve performance when uploading a large number of smaller files by reducing IO traffic.

Even after running instances with this patch applied we could see no negative effects. And less code is always less code to maintain.

The below assumption was not reproducible in our testing. ~~Setting the SizeThreshhold to a low value can lead to Unexpected EOF errors when decoding multipart messages.~~

line-o avatar Jun 21 '21 17:06 line-o

@line-o I am afraid we need some more context here. What is the issue and how is it reproduced? Also we may need some tests, but I am not certain as I cannot tell what the issue is yet..

adamretter avatar Jun 21 '21 17:06 adamretter

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarqubecloud[bot] avatar Jun 21 '21 17:06 sonarqubecloud[bot]

it reduces the file i/o for smaller files that are uploaded; normally the buffer is 32k or so;

dizzzz avatar Jun 22 '21 09:06 dizzzz

From https://commons.apache.org/proper/commons-fileupload/apidocs/org/apache/commons/fileupload/disk/DiskFileItemFactory.html#setSizeThreshold-int-

I understand that the default threshold is 10KB. The meaning of the threshold (according to the Javadoc) is that files smaller than the threshold are kept in memory, and files larger than the threshold are written to a temporary file on disk.

I would think that a value of 0 would mean that ALL files are written to a temporary file on disk.

Unexpected EOF errors when decoding multipart messages.

If any value set to the threshold is causing errors, then by my understanding this is clearly a bug and changing the threshold will only mitigate the issue. Can you share your test cases please?

adamretter avatar Jun 22 '21 10:06 adamretter

I am in the middle of creating reproducible test cases / testing it.

line-o avatar Jun 22 '21 12:06 line-o

It could be this was added due a bug in the library.....

dizzzz avatar Jun 22 '21 19:06 dizzzz

I deep-dived in the history. Initially I thought that this threshold was set to 0 because of several issues we faced in the years; but I am wrong

the code was added by @wolfgangmm in 2004, where multi-part form data parsing was added. For now I see no good reason to keep this line, as little in-memory caching will help reducing overhead for IO.

dizzzz avatar Jun 22 '21 19:06 dizzzz

@dizzzz Yes once we have the test cases from @line-o (which he said he is working on)

adamretter avatar Jun 22 '21 20:06 adamretter

There is no test case to add. This PR, unfortunately, does not affect the IOexceptions we observe when uploading large file over HTTPS.

line-o avatar Jun 22 '21 21:06 line-o

@line-o Okay that's unfortunate. Do you want to close this now then?

adamretter avatar Jun 22 '21 21:06 adamretter

No, as @dizzzz already pointed out this will improve performance for small files.

line-o avatar Jun 22 '21 21:06 line-o

This could improve performance when uploading a large number of smaller files by reducing IO traffic.

My concern is that this change has a negative impact also which needs to be considered. Many parallel uploads will consume memory which is an expensive and finite resource when compared to disk. IMHO it is better that these things are buffered in disk than in memory, and I doubt you will see a noticeable performance difference due to paged disk cache etc; Not to mention that when considered overall, network will be far slower than either disk or memory.

adamretter avatar Jun 22 '21 21:06 adamretter

Memory is cheap. Where do you expect parallel uploads to impact memory significantly? one megabyte of memory can hold 1024 files in memory (assuming the 10K default limit you mentioned). That is a small amount of RAM and a huge amount of parallel uploads.

line-o avatar Jun 22 '21 21:06 line-o

I have no preference; I think this zero-tuning is really not needed, and should not be there. But I hear adam's argument...

dizzzz avatar Jun 23 '21 20:06 dizzzz

This is superseded by https://github.com/eXist-db/exist/pull/4575 and can likely be closed.

adamretter avatar Oct 05 '22 18:10 adamretter

still makes sense for v6 ; not required for v7 and beyond

dizzzz avatar Oct 10 '22 17:10 dizzzz