dropbox-sdk-dotnet icon indicating copy to clipboard operation
dropbox-sdk-dotnet copied to clipboard

Performance Issue with download

Open prakashguru opened this issue 8 years ago • 11 comments

Hi, I have observed a serious performance issue with the upload functionality. The upload caches entire file content in memory twice!!! If I upload a 100 MB file the application memory consumption jumps by 200 MB.

The problem lies on the DropboxRequestHandler.RequestJsonStringWithRetry(string host, string routeName, RouteStyle routeStyle, string requestArg, Stream body = null)

If the stream to upload is is seek-able, then for retry to work the content is cached twice using memory stream!!!.

Is there a way to override this behavior?

prakashguru avatar Mar 16 '16 11:03 prakashguru

Thanks for your feedback! Currently there is no way to override this behavior but we will consider optimizing this in our future release. For large file upload, we recommend using chunk upload (upload_session/start -> upload_session/append -> upload_session/finish) as it provides better reliability. See Here for example.

qimingyuan avatar Mar 16 '16 17:03 qimingyuan

Thanks for your response qimingyuan!.

The chunked upload is the one I am looking for. But the .net implementation suffers the same memory issues I mentioned in the first comment. I wrote a c# code which directly calls the http api where I can get the performance I expected. I am sharing the code so that it can be useful for some one really require the performance. you can also modify the UploadAsync() and chunked uploads using the logic explained in the ExtendedDropboxClient.UploadAsync method.

https://github.com/prakashguru/codesamples/blob/master/ExtendedDropboxClient.cs

The idea behind the performance improvement is, For non seekable streams we have to use the PushStreamContent as the HttpContent of the HttpRequestMessage. Where we can supply the stream content dynamically. Which eliminates the need for a separate MemoryStream. Finally I get only 1 MB of application memory is getting increased!!! for uploading 300 MB video file.

I have few comments about your .NET sdk architecture. It is simple and clean but it is not open for modification. e.g. I cannot subclass and override the way uploads are handled, I have to modify lot of pieces to get it work

prakashguru avatar Mar 21 '16 06:03 prakashguru

I just stumbled upon this and yes it should be fixed. You're using Streams incorrectly. Uploading a 1GB file in 16MB chunks yields a 2GB memory allocation. @prakashguru provided working code, why is this bug sitting open for a year waiting for me to trip over it? Same reason the DropBox client is currently burning 15% CPU and half a gig of RAM despite the fact I haven't touched a file in there for two weeks I bet.

chadwackerman avatar Jan 08 '17 15:01 chadwackerman

Thanks @prakashguru and @chadwackerman. I think there is trade off between performance and reliability here and we are leaning more towards reliability side. Feeding an uncontrolled stream with unknown size into API directly is not recommended and is more error prone. Any transient error will cause the entire file to be re-uploaded or data loss if the stream is not rewindable. The API can handle about ~100MB upload in one call and it is recommended to split the data into 4MB~8MB chunk so that single failure will only cause small amount of data to be re-uploaded. If the chunk is small enough, I don't think there will be too much overhead.

If you really want to trade reliability for performance, in version 4.1.1+ if you create DropboxClient with MaxRetriesOnError = 0, it will feed the input stream directly into HttpStreamContent without creating an extra copy. This way will get minimal memory usage.

@chadwackerman Could you elaborate more on how stream is incorrectly used here or share some sample code about how you reach 2GB memory allocation? In your settings the extra memory required should be no more than 16~32MB.

qimingyuan avatar Jan 10 '17 02:01 qimingyuan

I don't believe either of us are handing off some random stream. That's not what we mean by streaming. I'm opening a FileStream on a file of known size. DropBox is allocating 2x that file size. And not as a page mapped file, actual RAM usage.

The whole point of a Stream is that you shouldn't need to do any additional allocations. If Stream.CanSeek returns true then the System will do the right thing -- for files especially. If you want to handle the case where someone hands you some weird stream that doesn't seek, then slap a 4MB BufferedStream on it if 4MB is optimal for your backend. Frankly I shouldn't even be aware that this is happening. It should just work.

chadwackerman avatar Jan 10 '17 03:01 chadwackerman

Thanks for the feedback. We will consider adding a helper function can automatically handles the splitting.

qimingyuan avatar Jan 10 '17 04:01 qimingyuan

Ah, the dreaded "enhancement" label. This is actually a bug.

I'm seeing a lot of companies on GitHub with minimal C# experience offering pretty shaky C# clients. Async wounded most of you and now .NET Core is providing the mortal blow.

Most companies should just spend a week cleaning up their REST documentation then provide a few simple C# examples calling the most important endpoints, mostly showing how to deal with whatever weirdo authentication scheme your Ruby/Node/Python backends coughed up eight years ago.

But DropBox's API is so wonky and has so many endpoints that you'd really benefit from a proper client. You've got clear bugs here in Download and Upload that you're ignoring. If I can't easily and reliably copy files to/from DropBox then really, what's the point of a SDK?

chadwackerman avatar Jan 10 '17 05:01 chadwackerman

@qimingyuan It is a clear bug period. With the fix, for uploading of 100 MB file I get only 1 MB of RAM increased instead of 200 MB. It clearly indicates it is a bug.

prakashguru avatar Jan 20 '17 06:01 prakashguru

But the fix removed retry logic right? You can achieve similar memory usage by setting maxRetriesOnError to be 0.

qimingyuan avatar Jan 20 '17 16:01 qimingyuan

With a seekable stream (almost all of them are) you get retries and minimal memory usage. The default codepath should be the 1% fallback codepath, not the primary one.

chadwackerman avatar Jan 20 '17 19:01 chadwackerman

This has been fixed in v4.1.3. We no longer do copy when the stream is seekable and since we never retry on non-seekable stream, you should always get minimal memory usage.

qimingyuan avatar Jan 21 '17 03:01 qimingyuan