SyliusResourceBundle icon indicating copy to clipboard operation
SyliusResourceBundle copied to clipboard

Attempt to provide correct API for multipart resources management

Open arnaudgeiser opened this issue 2 years ago • 7 comments

Description

This PR takes the commits from the 1.0.0 branch related to multipart requests and add some tests around it : Attempt to provide correct API for multipart resources management

Unfortunately, the tests have demonstrated the proposed implementation was still not working as expected. There is absolutely no guarantee we'll able to read the content of the stream (the files) before it has been closed and thus the decoder destroyed, removing all the files before we were able to copy them.

The proposed fix is actually a breaking change (I don't expect to get it merge as it is but I want to raise the discussion).

  • Instead of returning a manifold.stream, we return a vector containing the manifold.stream and a function that can be called to destroy the decoder. That destroy-decoder-fn needs to be called on a d/finally to ensure all the resources have been cleaned up.

Here is the related issue : https://github.com/clj-commons/aleph/issues/567

As we don't want breaking changes, I guess we'll have to deprecate the decode-request function. I see absolutely no-way to fix the current implementation... It's either we break or we deprecate at that point.

I tried to split the commits in three parts:

  • The original commit from @kachayev (https://github.com/clj-commons/aleph/pull/596/commits/e67f920acc68fccde25e7448af66118731900e2d)
  • Some tests to prove it's not working in some cases (https://github.com/clj-commons/aleph/pull/596/commits/5e20b5d041dccaadbdaaee0ebb44748834c8e6bc)
  • A fix with the breaking change : (https://github.com/clj-commons/aleph/pull/596/commits/1962575dd9c8daa7a59ac49e9598874da723a4b6)

What are your thoughts about that?

arnaudgeiser avatar May 03 '22 23:05 arnaudgeiser

I'll have to take a deeper look later. I take it there's no further commentary from Oleksii on this?

KingMob avatar May 04 '22 08:05 KingMob

@kachayev did a nice work at summarizing the different paths we could take for the cleanup API. https://github.com/clj-commons/aleph/pull/432#issuecomment-449185986 He proposed an implementation that didn't introduce any breaking change. However, unless I'm mistaken, s/on-closed might not always happen on the same thread that consumes the stream.

(s/on-closed
      parts
      (fn []
        (when (compare-and-set! destroyed? false true)
          (try
            (.destroy decoder)
            (catch Exception e
              (log/warn e "exception when cleaning up multipart decoder"))))))

Now that I think about it, maybe we can include an additional opts to the decode-request function and do not perform s/on-closed when used. That way, we'll be able to keep the current behavior and introduce a new one without any breaking change.

Something like clean-on-closed? which defaults to true. What do you think?

arnaudgeiser avatar May 04 '22 20:05 arnaudgeiser

How do the proposed changed play with upcoming Netty changes, in particular: https://github.com/netty/netty-incubator-buffer-api/issues/48 ?

KingMob avatar May 10 '22 11:05 KingMob

To be honest, I'm close to think we should just deprecate the feature on Aleph (and mark it as unstable) and just provide examples about how it can be done using Aleph/Netty. In the current situation, we will always run to have something that works more or less. It will either be too low/high level to be usable so I wouldn't do any commitment about a leaky Aleph abstraction based on a leaky Netty abstraction that might be revisited on the future.

arnaudgeiser avatar May 18 '22 21:05 arnaudgeiser

That works for me, especially if Netty itself has issues with it.

  1. Want to add docs/examples demoing it with that Apache middleware you mentioned?
  2. Is there something we can link to in Netty to track their internal status/progress?

KingMob avatar May 19 '22 10:05 KingMob

  1. Want to add docs/examples demoing it with that Apache middleware you mentioned?

Yes, I will add the content of this PR as an example + one that uses multipart upload.

  1. Is there something we can link to in Netty to track their internal status/progress?

I don't know.

arnaudgeiser avatar May 19 '22 16:05 arnaudgeiser

After a second thought about this. I think a good API around this would be to not return a manifold.stream at all but provide something like multipart/transduce that will return a deferred but will close the resources (ByteBuf + decoder) under the hood.

I will try to cook something...

arnaudgeiser avatar Aug 15 '22 14:08 arnaudgeiser

Quick update... No progress for now on that topic, I expect to spend some time on the following days (it matures on my mind, not on the PR for now).

arnaudgeiser avatar Oct 08 '22 09:10 arnaudgeiser

"Good news", I have been able to land a fix on top of Oleksii's PR. The idea is to introduce a destroy-delay before calling .destroy on the decoder to let some time to the client to consume the stream. [1]

It seems a bit hacky but that's the only way I found to not break or deprecate the current API which returns a manifold.stream.

I still think having a higher level API on Aleph might be valuable, like transduce-request. I have a branch for this, but as long as there is no demand from Aleph users, I won't spend more time create a true PR (I will keep this dormant for a while) [2].

[1] : https://github.com/clj-commons/aleph/commit/3f6bb240f70fb8db7908668a8f6491092cda78f4 [2] : https://github.com/clj-commons/aleph/commit/9217079d2efc2720b71fd0bccb3bcc25833dded5#diff-272a90e0af5e55488c08269c3ab128f28f2d63dcad30be761aca8c59cd0769c1R348-R362

arnaudgeiser avatar Oct 13 '22 19:10 arnaudgeiser

  1. Force the user to manually clean up, just like with raw-streams. I think Netty does this.

Yes, that could work whether it was a new API. But the current implementation rely on the fact that the clean-up is performed automatically when the decoder is destroyed.

This would require an additional parameter, it might not be a bad idea though.

  1. Stick files in a temp directory, and let the OS clean up when it needs space. (Is that safe?)

This is a time-bomb for me and even if the OS cleans up the space at some point, most platforms are monitored for disk-usage. I'm afraid it will create even more issues on our side.

  1. Clean up automatically if the user follows a certain pattern, namely, a) being totally done with the parts when their handler exits, and b) no handing off references to other threads.

While I have a rough idea how a could be implemented, I'm totally armless for b.

  1. Delete after a delay when the stream closes, like this PR.

The multipart_params middleware on ring seems to use the same approach and defaults to one hour. [1]

[1] : https://github.com/ring-clojure/ring/blob/master/ring-core/src/ring/middleware/multipart_params/temp_file.clj#L50-L51

So for me:

4 > 1 > 2 > 3

But I will follow your call on this.

arnaudgeiser avatar Oct 17 '22 07:10 arnaudgeiser

"armless"? I haven't heard that one before.

Regardless, for option 3, I wasn't suggesting we implement a and b. I meant, a and b are a contract that the user will have to promise to follow, there's no enforcement, unfortunately. In that sense, what we would do is wrap the user handler, dispose of resources when the handler exits, and it's up to the user to not screw up. Since we'd delete by default, if the user needed the files to hang around, we'd add an API for them to unmark for deletion. Does that change your ranking at all?

If clj-http does an hour, I'm fine with that. There's no perfect solution, and at least 1 hour has parity going for it...

KingMob avatar Oct 17 '22 08:10 KingMob

"armless"? I haven't heard that one before.

Let's say "unarmed" :)

Regardless, for option 3, I wasn't suggesting we implement a and b. I meant, a and b are a contract that the user will have to promise to follow, there's no enforcement, unfortunately. In that sense, what we would do is wrap the user handler, dispose of resources when the handler exits, and it's up to the user to not screw up. Since we'd delete by default, if the user needed the files to hang around, we'd add an API for them to unmark for deletion. Does that change your ranking at all?

Got it! The approach seems elegant but it doesn't fit the current public API we expose to the client. Does it? We don't hold any reference to a handler we can wrap currently.

If clj-http does an hour, I'm fine with that. There's no perfect solution, and at least 1 hour has parity going for it...

I was not referring to an HTTP client in that case, but any kind of server that is ring compliant.

arnaudgeiser avatar Oct 17 '22 08:10 arnaudgeiser

Regarding this comment:

I'm not generally comfortable with a time-based clean-up approach. Nothing prevents the user from sticking the files in a list to batch transfer overnight, after the files was deleted.

That's why on our example we recommend to perform an io/copy so the temporary file can be removed afterwards when calling .release. [1]

[1] : https://github.com/clj-commons/aleph/pull/596/files#diff-272a90e0af5e55488c08269c3ab128f28f2d63dcad30be761aca8c59cd0769c1R283

arnaudgeiser avatar Oct 17 '22 08:10 arnaudgeiser

What we can do also...

  1. Match multipart_params middleware and (optionally) delete THE FILES after a delay when the stream closes. (~like 4)

And we revisit/ditch the rest of this PR:

  • it's still the responsibility of the decoder to release/delete the files (and not the user)
  • it's still the responsibility of the decode-handler [1] to release the memory (no more .acquire [2]) and not the user
  • no more evolved read-attributes [3]
  • MultipartChunk can be removed
  • we keep the broken behavior when files are involved (and add an optional destroy-delay parameter)
  • we recommend users to use an "infinite" memory-limit by default
  • we recommend users to use a "destroy-delay" (of 1 hour) when files will be involved

And we have something that works with Aleph without introducing any breaking changes, risky fixes, bad defaults and that we can easily justify regarding the rest of the Clojure ecosystem.

WDYT?

(:cry: : bringing 1.0.0 features to main branch always raise so many questions)

[1] : https://github.com/clj-commons/aleph/pull/596/files#diff-272a90e0af5e55488c08269c3ab128f28f2d63dcad30be761aca8c59cd0769c1R325 [2] : https://github.com/clj-commons/aleph/pull/596/files#diff-272a90e0af5e55488c08269c3ab128f28f2d63dcad30be761aca8c59cd0769c1R226 [3] : https://github.com/clj-commons/aleph/pull/596/files#diff-272a90e0af5e55488c08269c3ab128f28f2d63dcad30be761aca8c59cd0769c1R238-R251

arnaudgeiser avatar Oct 17 '22 09:10 arnaudgeiser

That's why on our example we recommend to perform an io/copy so the temporary file can be removed afterwards when calling .release.

Ahh, can we add a comment explaining the "why" of that call for readers?

KingMob avatar Oct 18 '22 04:10 KingMob

I think option 5 sounds good. Fundamentally, I think it's up to the user to not leak resources, since there's no solution we can do that's 100% correct.

I'm not sure about recommending an infinite memory-limit, though. Very large, yes, but infinite?

Which do you think is the best option at this point?

KingMob avatar Oct 18 '22 07:10 KingMob

I'm not sure about recommending an infinite memory-limit, though. Very large, yes, but infinite?

Yes, infinite might not be a good recommendation. I would rather formulate as :

When using decode-request (without destroy-delay), set a memory-limit to ensure your data won't be stored in temporary files and enforce the presence of the content-length header up to this maximum value. Whether the content-length is greater than your memory-limit or absent, return a bad request.

Something along those lines...

Which do you think is the best option at this point?

Another round on this PR to apply 5. :dancers: I plan to work on it during the week-end.

arnaudgeiser avatar Oct 20 '22 18:10 arnaudgeiser

Okay, I decided to start over again and revisit all my assumptions. I removed all my modifications since the beginning and did a deep dive onto Netty regarding multipart uploads.

Oleksii's approach was good. Unfortunately, calling removeHttpDataFromClean was not enough as some files were borrowed inside bodyListHttpData [1].

I end up having to do only those modifications [2] Which means:

  • No new destroy-delay parameter
  • Cleanup of the files is left to the user (with netty/release)

Before I get more involved in this (cleanup, addressing all the remarks you did above), can you give me an opinion on this approach @KingMob ?

[1] : https://github.com/netty/netty/blob/839eda075531ec4bbdb247c6db2cf286e34a130b/codec-http/src/main/java/io/netty/handler/codec/http/multipart/HttpPostMultipartRequestDecoder.java#L970 [2] : https://github.com/clj-commons/aleph/pull/596/commits/16c9ed3136242a1ef137c6ad91150c3004531c8b

arnaudgeiser avatar Nov 14 '22 23:11 arnaudgeiser

@arnaudgeiser I think your new approach is sound. There's no perfect option, so let's just do the best we can, document it, and let the user decide what to do with the files.

KingMob avatar Dec 06 '22 07:12 KingMob

Pfff... Actually, the proposed solution is a breaking change even in its initial form as you will have to call netty/release on each file manually. Let's close this one in favor of this : #656

The findings were interesting though if one day we need a fine granularity clean up.

arnaudgeiser avatar Jan 08 '23 20:01 arnaudgeiser