kirby icon indicating copy to clipboard operation
kirby copied to clipboard

Chunked file uploads

Open distantnative opened this issue 1 year ago • 27 comments
trafficstars

This PR …

Enhancements

  • Panel uploads can now exceed the upload_max_filesize limit as they're uploaded in chunks. If you want to restrict the upload size, please use the file blueprint accept maxsize option: https://getkirby.com/docs/reference/panel/blueprints/file#accept

Ready?

  • [x] In-code documentation (wherever needed)
  • [x] Unit tests for fixed bug/feature
  • [x] Tests and checks all pass

For review team

  • [x] Squash commits
  • [ ] Add changes & docs to release notes draft in Notion

distantnative avatar Apr 28 '24 18:04 distantnative

I've tried to upload a EXE file. Kirby 4.2.0 throws a error The extension "exe" is not allowed but nothing happens for this PR. Seems something broken.

afbora avatar Apr 30 '24 12:04 afbora

Also I realized that file is still remains when I cancel the upload. Should be deleted. But should be careful when file replacing.

afbora avatar Apr 30 '24 13:04 afbora

@afbora We cannot delete the tmp file on cancel. But we clean up the files with a garbage collection.

distantnative avatar Apr 30 '24 13:04 distantnative

Adding the "two-person review" label as I realized we need to be super careful about the security implications here. There's a lot that could go horribly wrong if we miss anything. I think we are heading in a great direction though 👍

lukasbestle avatar May 02 '24 19:05 lukasbestle

@lukasbestle makes sense. And as written in the initial post, I think we should have a few review rounds as well. Now getting the concept right, later adding unit tests etc. before final review.

distantnative avatar May 02 '24 19:05 distantnative

We cannot delete the tmp file on cancel. But we clean up the files with a garbage collection.

@distantnative What if we upload to the temp folder, does it make sense? If upload successful, we can move the file to the model root, if not garbage collector take care.

afbora avatar May 03 '24 07:05 afbora

@afbora That's what we are doing here. Just not the system temp folder as that doesn't persist across requests. So the first parts aren't there anymore when the last chunk is received.

distantnative avatar May 03 '24 07:05 distantnative

Is it theoretically possible to add a config option that will limit the file size?

tobimori avatar May 03 '24 10:05 tobimori

I've found few issues, could you confirm that?

  • File not uploaded without error or log (dialog closed and no file uploaded,) with big files. For me, ~100 MB success, ~300 MB fails.
  • I'm getting A file with the name "filename.zip" already exists error when big files upload from textarea field

afbora avatar May 03 '24 12:05 afbora

@afbora the PR currently is not functional as it's in the middle of implementing changes based on Lukas' feedback and some troubles I encountered (see our comments discussion). Sorry that you tried, but I think any upload fails currently that needs more than one chunk.

distantnative avatar May 03 '24 12:05 distantnative

@distantnative No problem. I'll test again after PR is ready.

afbora avatar May 03 '24 13:05 afbora

Is it theoretically possible to add a config option that will limit the file size?

Good point. I think it would make sense to make this an option for the file blueprint.

lukasbestle avatar May 05 '24 12:05 lukasbestle

@lukasbestle @tobimori are you thinking of something different than the existing accept maxsize option?

distantnative avatar May 05 '24 12:05 distantnative

I didn't know there was an option already. Consider my request as solved 😀

tobimori avatar May 05 '24 12:05 tobimori

Also forgot about maxsize. 😆

lukasbestle avatar May 05 '24 13:05 lukasbestle

@lukasbestle added some more validations – skipping MIME types (will still checked for the full file), but at least basic ones as allowed extensions.

distantnative avatar May 11 '24 12:05 distantnative

About maxsize: We need to ensure that we check for that in three ways:

  • The frontend should check for it before even uploading the first chunk (avoids user frustration). Maybe this already happens.
  • The backend should also ensure it by checking the Upload-Length against it.
  • Also there should be a validation for the actual uploaded chunk size. So the size of the current tmp file + the size of the temporary chunk file provided by PHP should be compared to maxsize before streaming over the chunk to our tmp file. Otherwise one could provide an Upload-Length with an allowed size (e.g. 5 MB) but send a chunk with 100 MB. The file would then already take up double that size on the server before Kirby finally validates the size in $page->createFile().

lukasbestle avatar May 15 '24 11:05 lukasbestle

@lukasbestle I am not sure we can follow your suggestions for maxsize. This would require us parsing the blueprints on each of those requests (Panel view, chunk upload). And given that our blueprints currently are always fully resolved (sadly still), this could create real performance issues. Because of that, I think until we resolve blueprints more lazily, we have to stick to the (not ideal) behavior that the accept rules only get evaluated at the end of the full upload.

distantnative avatar May 16 '24 18:05 distantnative

About maxsize: You are right that it won't work out to do the first check in the frontend as we would need to provide all possible maximum sizes for all allowed file blueprints before we even know if a file will be uploaded at all. However during an active upload, this is different: There I feel like the performance impact of checking the maxsize is negligible compared to the duration of the upload (unless the uploader is somewhere in Switzerland with a symmetric 1 Gbit/s connection 😆).

However with these checks we could prevent attackers from stuffing our custom temp dir with insanely large partially uploaded files that are so far prevented by the blueprint. At the moment, PHP will accept the file first, store it to the system temp folder and then Kirby decides not to copy it elsewhere. So the damage is limited to the upload_max_filesize for a brief moment in time (PHP auto-cleans uploaded files that were not copied during the request). With chunked uploads, an attacker could however initiate an upload of a 1 GB file and upload 999 MB of that. Those 999 MB will reside on the server until cleanup. This could lead to denial of service if done until the disk on the server or quota is full.

lukasbestle avatar May 21 '24 19:05 lukasbestle

@lukasbestle I added a check on the backend for maxsize: both, checking the total length from the header as well as existing tmp plus chunk size.

distantnative avatar May 24 '24 18:05 distantnative

Would be great indeed. I've rebased and squashed the commits already.

distantnative avatar May 26 '24 09:05 distantnative

@distantnative I'm getting A file with the name "filename.zip" already exists error when big files upload from textarea field. Same file uploads without error from files section.

afbora avatar May 27 '24 15:05 afbora

@afbora good catch. Can reproduce myself. Somehow already after the first chunk they get moved to the content folder. But need a fresher mind to investigate why.

distantnative avatar May 27 '24 20:05 distantnative

Maybe a different route processes these so the Upload-Length header is ignored and the new chunk logic on the backend doesn't get executed?

lukasbestle avatar May 28 '24 06:05 lukasbestle

Indeed it is. So I am thinking to actually move the Upload::chunk() call here: https://github.com/getkirby/kirby/blob/main/src/Api/Api.php#L697

So any place using $api->upload() would work with it.

Furthermore, I think I would like to move most of the Code from Api::upload() to the new Upload class. I think that could als help us to simplify that code into several single-purpose methods, e.g. Upload::response(), Upload::error(), Upload::filename() ...

distantnative avatar May 28 '24 08:05 distantnative

Makes sense to me. The Api::upload() method is indeed a spaghetti monster at the moment.

Behavior would be that Api::upload() would check for chunks in the place you linked to and only call the callback on the last chunk when the upload is complete, right?

lukasbestle avatar May 28 '24 09:05 lukasbestle

Yes, exactly. Have a version working locally. It really helps breaking this up but also having it all in the one class then.

Now only need to add unit tests to all new parts of the Upload class. Bit of work but also is nice cause we'll cover most of this with proper unit tests now.

distantnative avatar May 28 '24 11:05 distantnative

Works great for me. I couldn't see any issue 🎉

afbora avatar Jun 27 '24 09:06 afbora

I think I have encountered an issue: Uploading a large file and canceling while it's uploading, it still shows up in the file list after a reload.

distantnative avatar Jun 27 '24 15:06 distantnative

Turns out: When one cancels the upload dialog, all running uploads still run and finish in the background. This is not new in this PR but a lot more visible/likely to happen with a different expectation by the user.

distantnative avatar Jun 27 '24 21:06 distantnative