libhttpserver icon indicating copy to clipboard operation
libhttpserver copied to clipboard

Add a hook or callback for cleaning up on disk file uploads

Open LeSpocky opened this issue 2 years ago • 10 comments

Is your feature request related to a problem? Please describe.

Since 3fd99c87b8ea999df593060999be59786e844bba it's possible to let libhttpserver store files uploaded with forms (mime-type multipart/form-data) to disk, see the pull request #257 for reference. This whole new feature request is about cases where file upload target is set to FILE_UPLOAD_DISK_ONLY or FILE_UPLOAD_MEMORY_AND_DISK.

In certain circumstances it is necessary for an application to not store those files permanently, but remove them from disk again, after processing. Doing that in a render method associated with a registered resource is possible. However the file is always stored to disk, regardless if that render method is called at all. Multiple code paths in webserver::finalize_answer(), mostly error cases, are possible leading to mr->dhrs assigned by something else than the render callback supplied by the application. Simplest case is sending a (possibly forged) request to a not registered resource (HTTP status 404), but others are possible, too. That would mean the cleanup in your render method is not executed.

The actual target we use libhttpserver on is an embedded device with usable RAM smaller than files uploaded, so we have to store those files on disk. Those files can and should be removed again on every request, otherwise the filesystem would fill up quickly. This is especially true for manipulated requests with bad intent. A full filesystem can be considered as a denial of service situation.

Describe why the feature or enhancement you are proposing fits the library.

I propose some extension to libhttpserver for cleaning up all uploaded files of a request at the end of request processing. Thinking of the possibility of optionally hooking up a callback function provided by the application, where user can decide how to cleanup, and call that in an appropriate place.

Describe the solution you'd like

Extend the create_webserver options by a function to set a custom cleanup function. That function would not be called if not set by the application. However if set it could be executed next to MHD_destroy_response(raw_response); at the end of webserver::finalize_answer()?

Describe alternatives you've considered

With the current API of libhttpserver always cleaning up would only be possible by providing custom methods for all of these:

  • internal_error_page
  • method_not_allowed_page
  • not_found_page

And you had to put that same cleanup code into every render method in your application, no matter if it is supposed to handle file uploads at all.

This would lead to lots of duplicated code with high potential to miss some path.

Additional context

Same problem applies to the draft #246 not followed (for good reasons) in favor of #257.

LeSpocky avatar Mar 23 '22 13:03 LeSpocky

I totally agree and have to admit, that I completely forgot about this behavior in my merge request. This could be indeed be a problem if there are lots of files stored onto a disk.

But why should a custom cleanup function from the calling application take care about the cleanup. As the class file_info and everything else resides inside the library (and is created inside), I would see the cleanup task as task of the library.

In my opinion, the library should always cleanup all uploaded files. Why not put it into the destructor of the file_info class?

Of course there is a use case (i.e. the use case, I implemented this feature for), where we want to have these uploaded files be preserved longer than the actual request. For this case you would simply need a possibility for the application, to mark some files as to not be deleted. Of course, than the application would be responsible for deleting all the marked files eventually.

ctomahogh avatar Mar 24 '22 12:03 ctomahogh

Of course there is a use case (i.e. the use case, I implemented this feature for), where we want to have these uploaded files be preserved longer than the actual request. For this case you would simply need a possibility for the application, to mark some files as to not be deleted. Of course, than the application would be responsible for deleting all the marked files eventually.

This is what I suspected. Some users want to keep those files, others don't, others maybe only want to keep some files.

What about a default cleanup method built into libhttpserver, which would remove all files, and a possibility to override that cleanup method by the application? Could look somewhat like the already present custom defaulted error messages or custom logging callbacks?

LeSpocky avatar Mar 24 '22 12:03 LeSpocky

What would a custom cleanup mean? Would this have to be registered on webserver creation and then take effect on every request? Or could this be set on each request individually?

Problem I see is, that one would have to keep track of the files, which should not be deleted in the application. And the handler can be called completely out of the render functions.

Personally, I would rather prefer a solution, where you could command the library to delete them or not. But this should be possible on every request.

So, if the library sends an error message (i.e. 404) or if the render function (i.e. for render put) does not take care about the files, the library should automatically delete the files in the end. But if the application does something with the files, then it should tell the library to not delete the files at the end of the request. Ideally down to single files.

ctomahogh avatar Mar 24 '22 13:03 ctomahogh

What would a custom cleanup mean? Would this have to be registered on webserver creation and then take effect on every request? Or could this be set on each request individually?

What I had in mind was a rather global function registered on webserver creation.

Problem I see is, that one would have to keep track of the files, which should not be deleted in the application. And the handler can be called completely out of the render functions.

Yes. But I see this as a feature. The render functions are bound to a registered resource. File uploads (think of malicious requests from client with bad intent) can happen to any resource, especially resources where the webserver would answer with 404.

Personally, I would rather prefer a solution, where you could command the library to delete them or not. But this should be possible on every request.

So, if the library sends an error message (i.e. 404) or if the render function (i.e. for render put) does not take care about the files, the library should automatically delete the files in the end. But if the application does something with the files, then it should tell the library to not delete the files at the end of the request. Ideally down to single files.

Mkay. Do you have an idea how this could be realized from API point of view?

LeSpocky avatar Mar 24 '22 14:03 LeSpocky

Actually it might really get difficult to find a good solution for setting the cleanup on every request individually (at least I have no quick idea on that).

So yes, a callback or something like that is probably a better solution. How would a callback in your opinion look like? Or what will be handed over to this callback? Would this callback be called on every uploaded file individually and only provide the path to a single file? cleanup_callback(std::string &filepath); Or would this callback get the whole map of files? cleanup_callback(std::map<std::string, std::map<std::string, http::file_info>> &files)

Maybe a little bit more radical approach: Don't offer a cleanup callback at all and always delete all uploaded files (regardless whether a 404 is returned or whether a render function was called or not). If an application needs one or more files to be stored for longer than the request, it can simply copy the file. Or just rename the file in the application. Then the unlink-call in the libhttpserver would fail but this would be considered as acceptable and not trigger an error.

ctomahogh avatar Mar 28 '22 12:03 ctomahogh

Actually it might really get difficult to find a good solution for setting the cleanup on every request individually (at least I have no quick idea on that).

So yes, a callback or something like that is probably a better solution. How would a callback in your opinion look like? Or what will be handed over to this callback? Would this callback be called on every uploaded file individually and only provide the path to a single file? cleanup_callback(std::string &filepath); Or would this callback get the whole map of files? cleanup_callback(std::map<std::string, std::map<std::string, http::file_info>> &files)

What I had in mind was the whole map.

Maybe a little bit more radical approach: Don't offer a cleanup callback at all and always delete all uploaded files (regardless whether a 404 is returned or whether a render function was called or not). If an application needs one or more files to be stored for longer than the request, it can simply copy the file. Or just rename the file in the application. Then the unlink-call in the libhttpserver would fail but this would be considered as acceptable and not trigger an error.

Yes, I mean I could live with that, because it would not interfere at all with my usecase. For the case where those files are needed later, that copy might be expensive, depending on how it is done, but you could argue that's business of the user.

Just moving (renaming) a file would make sense for the case where the lib generates the filenames anyways, but I did not thought of this in the first place, because I considered the lib responsible for those files.

Either way, it would need an update of documentation, to clearly state what the library does and what not. It's a little hard to guess what user expectations are in this case. 😕

LeSpocky avatar Mar 28 '22 13:03 LeSpocky

I am fine if the cleanup function would get the whole map. This was merely a question.

Actually the "radical" approach would just be the same if the user of the lib would not provide a custom cleanup function. Then the lib would again always delete all uploaded files and a user could simply preserve files by copying or renaming.

As a default cleanup is needed anyway I offering the possibility for a custom cleanup function is a good approach.

Anyway I think, this functionality (a default cleanup function at least) is really necessary for a released version. As using the library as it is at the moment could be used for some kind of DoS attack if a malicious client could simply fill up the whole file system of a device.

ctomahogh avatar Mar 28 '22 13:03 ctomahogh

Love the thread.

The general approach of the library with managing uploaded content is that the content is "owned" by the library until it leaves and it is accessed by the structures of the library itself. Along those lines, if you use an "in memory" approach, the content uploaded goes away as soon as the http_request object is deleted.

The fact that we store the file on disk is just an extension of memory to me; the library still "owns" that content until the user doesn't take explicit ownership of it.

As such, having a deletion by default is a good strategy (akin to freeing a dangling pointer and preventing a leakage).

To keep the library usable this is what I'd do:

  1. Have the library delete by default the file(s) on disk contextually with the deletion of the http_request they are related to.
  2. Provide a server-level override for the delete function - the users can basically void this function if they want to keep the files in the same location (had they provided a custom one) or move them if they need to manage them elsewhere.

On 2. I am working on a more general system of overrides at resource level; so keeping it at server level for now is fine - it will all go down one layer when I am finally close on this.

etr avatar Mar 29 '22 17:03 etr

Okay, what I take from this for now: we can start with the cleanup (1.) already, because it's more or less independent from the actual solution we come up with for hook or override (2.) or whatever it will be called.

LeSpocky avatar Mar 29 '22 18:03 LeSpocky

Okay, what I take from this for now: we can start with the cleanup (1.) already, because it's more or less independent from the actual solution we come up with for hook or override (2.) or whatever it will be called.

For that minimal "radical" approach, see #266.

LeSpocky avatar Apr 12 '22 05:04 LeSpocky