Seaside icon indicating copy to clipboard operation
Seaside copied to clipboard

Request streaming with Zinc

Open theseion opened this issue 4 years ago • 18 comments

This PR adds request streaming with Zinc to Seaside. The use case for this are primarily multipart/form-data file uploads. Usually, a request is read into memory completely before being processed, which means the contents of an entire file will end up in memory, regardless of the size of the file. This may kill the image in the worst case or lead to swapping / thrashing on the host machine. With request streaming it is possible to read the request in chunks and write these chunks to disk, so that no more than the chunk size has ever have to be allocated to process the entire file.

How it works

I've extended Zinc-Seaside with an option ZnServer>>streamUploads: to enable the feature. When request streaming is enabled, multipart/form-data requests will be read in chunks. If the request contains files they will be streamed to temporary files on disk.

There are two new important classes in Zinc-Seaside: ZnRingBuffer and ZnStreamingMultiPartFormDataEntity. These are the classes that implement the parsing logic.

The temporary files are represented by WAExternalFile which I moved to Seaside-Core. Since we already have WAFile I've created a hierarchy for file classes to make the different implementations polymorphic as far as possible.

I've also extended Grease with two methods for creating and writing temporary files (see https://github.com/SeasideSt/Grease/pull/103).

Caveats

This should all be part of Zinc in my opinion but Zinc has a lot of changes that aren't yet part of the stable image. Those changes will make some of the things simpler but they also make some things harder, e.g. it will no longer be possible to use ZnRingBuffer because the underlying buffered streams will pass the buffer to primitives (can't work since ZnRingBuffer is a wrapper`). I'm hoping that we can contribute this to Zinc in the future but for now I think it's best to have all the code in one place.

Additional stuff

This PR also includes overrides of Zinc methods that don't respect custom encoding (uses ZnCharacterEncoder utf8 instead of ZnCharacterEncoder default, which falls back to using ZnDefaultCharacterEncoder value, which in turn can be set with #defaultEncoder: on ZnServer`).

theseion avatar May 20 '20 12:05 theseion

As we discussed in https://github.com/SeasideSt/Grease/pull/103 I've moved #newTemporaryFileReference to the new Seaside-Zinc-Pharo package together with all of the stuff for request streaming.

I've also renamed Zinc-Seaside to Seaside-Zinc, but we can revert that if you don't like it.

Seaside-Seaside-Pharo-Tests contains ZnRingBufferTest which I had forgotten to move to Seaside before.

theseion avatar May 21 '20 12:05 theseion

I've reverted the renaming of Zinc-Seaside and opened a separated issue to do it for 3.6: https://github.com/SeasideSt/Seaside/issues/1197

theseion avatar May 21 '20 14:05 theseion

@theseion Some of the changes to the core of the Zinc-Seaside package are breaking things in GemStone. I can take a look if I can 'fix' them but it will probably be only by the end of the week.

Another option is create a specific subclass for the adapter and use that one in Pharo?

jbrichau avatar May 25 '20 07:05 jbrichau

Actually, nothing should break, that's why we put everything into a Pharo specific package. I'll look into it, maybe it's just a problem with the baseline.

theseion avatar May 25 '20 07:05 theseion

I did not investigate into detail, but there are still changes to the Zinc-Seaside package itself. My guess is that these break the Zinc adapter in GS. I did not try, so I cannot tell for sure.

jbrichau avatar May 25 '20 07:05 jbrichau

hey @theseion I suspect that if you merge in the changes from the master branch, some tests might clear up.

If not, I can take a look as it's Gemstone tests that are failing. But it would be better to have the last version first merged back into this branch as I have been making extensive changes for Gemstone tests on master.

jbrichau avatar Jul 03 '20 15:07 jbrichau

@jbrichau I've picked this one up again (it's been a while...). It requires the changes from https://github.com/SeasideSt/Grease/pull/138.

theseion avatar Jun 04 '22 13:06 theseion

Set up for quick testing:

  1. Add ZnZincStreamingServerAdaptor as adaptor in the Seaside panel
  2. Create a new component with the following content:
    renderContentOn: html
        html form
            multipart;
            with: [
                html fileUpload callback: [ :file | self halt ].
                html submitButton ] 
    
  3. Open a browser pointed at the component and submit a file
  4. You'll now see an open debugger in the image. The file argument holds a WAExternalFile that references a file in the temp location, which holds the contents of the uploaded file. This file was created by reading the file in chunks instead of using the default method of reading the file into memory.

theseion avatar Jun 04 '22 13:06 theseion

  • The failure in Pharo 10 is an unexpected pass. That should be fixed in another PR.
  • The failures in 6.1 seem strange. Not sure whether they are my fault.
  • Not sure about the GemStone failures. In GemStone I can see MNU's because of Pharo specific messages. Does Gemstone use Zinc too?
  • The failures in Squeak don't seem to be related, but I didn't check. Do you expect the Squeak tests to pass?

theseion avatar Jun 04 '22 14:06 theseion

I removed the expected failure from the master branch. I'll take a look at the Gemstone failures and see what I can do.

Squeak will be left unmaintained if nobody takes care of it. I don't have the time to dedicate to it.

jbrichau avatar Jun 06 '22 08:06 jbrichau

Some things would need to be cleared up before we can merge this one into master:

  • There is now a conflict with the Seaside-ExternalFileUploads package, which also defines the class WAExternalFile.
  • Some functionality in Seaside-Core is Pharo specific (e.g. fileReference)
  • The Zinc-Pharo package includes the methods that are now in Grease (since merging https://github.com/SeasideSt/Grease/pull/138)

I'm willing to help out getting things sorted. If you want, merge it into a branch on SeasideSt and we can work on it collaboratively? That would also help me getting the GemStone implementation merged in (if I can fix it)

jbrichau avatar Jun 06 '22 09:06 jbrichau

You already have write access to this branch, I don't think I need to move it to SeasideSt. Or do you need it on SeasideSt for Gemstone?

I will rebase on top of master and start addressing the things you've listed. I'll start with WAExternalFile.

Thanks for the help!

theseion avatar Jun 06 '22 13:06 theseion

Squeak will be left unmaintained if nobody takes care of it. I don't have the time to dedicate to it.

Fair. @tcj might be able to help with Squeak?

theseion avatar Jun 06 '22 13:06 theseion

You already have write access to this branch ah ok, I will commit there then!

jbrichau avatar Jun 06 '22 14:06 jbrichau

Fair. @tcj might be able to help with Squeak?

I will help as best I can. Does it help for me to run smalltalkCI locally? (Running it for the first time locally now.)

I understand this is about getting the ~tests~ checks to pass for Squeak. The ~tests~ checks must be failing for Squeak on every commit, and not just this one, correct? (Of course, the main functionality of this commit does not apply to Squeak since Squeak does not have Zinc.)

The failure behind WANoDuplicateUuidsTest looks like a dependency problem in the BaselineOfSeaside3... would you recommend I file a new issue regarding this and dig in there? I may have tried to dig into this issue before, actually: packages requiring Pharo somehow getting into the spec for Squeak, via the Tests groups... if I recall.

GRError: 'Seaside-Tests-Slime' depends on unknown package 'Seaside-Pharo-Slime'

tcj avatar Jun 07 '22 04:06 tcj

Thanks for offering @tcj.

I will help as best I can. Does it help for me to run smalltalkCI locally? (Running it for the first time locally now.)

That certainly helps, yes.

I understand this is about getting the tests checks to pass for Squeak. The tests checks must be failing for Squeak on every commit, and not just this one, correct? (Of course, the main functionality of this commit does not apply to Squeak since Squeak does not have Zinc.)

Correct. There are a couple of tests that have been failing in Squeak for a while.

The failure behind WANoDuplicateUuidsTest looks like a dependency problem in the BaselineOfSeaside3... would you recommend I file a new issue regarding this and dig in there? I may have tried to dig into this issue before, actually: packages requiring Pharo somehow getting into the spec for Squeak, via the Tests groups... if I recall.

That would be a great start, yes!

theseion avatar Jun 08 '22 04:06 theseion

Thanks for working on this! I'll get back to this as soon as possible.

theseion avatar Jun 30 '22 20:06 theseion

@theseion no problem. I just did a bit of integration work and the streaming seems to work fine. I still need to take a look at the gemstone issues that cause the builds to break.

jbrichau avatar Jul 01 '22 19:07 jbrichau

Rebased :)

theseion avatar Sep 18 '23 13:09 theseion

Codecov Report

Attention: 117 lines in your changes are missing coverage. Please review.

Comparison is base (7976eb9) 48.64% compared to head (51d6654) 48.92%.

Files Patch % Lines
...iPartFileFieldWithMimePart.boundary.decodeWith..st 0.00% 16 Missing :warning:
...tor.class/instance/convertMultipartEntity.with..st 0.00% 15 Missing :warning:
...nce/parseMultiPartFieldWithMimePart.decodeWith..st 0.00% 10 Missing :warning:
...ore.package/ZnRingBuffer.class/instance/at.put..st 0.00% 9 Missing :warning:
...c-Core.package/ZnRingBuffer.class/instance/hash.st 0.00% 9 Missing :warning:
...re.package/ZnRingBuffer.class/instance/printOn..st 0.00% 9 Missing :warning:
...nc-Core.package/ZnRingBuffer.class/instance/at..st 0.00% 7 Missing :warning:
...Core.package/WAExternalFile.class/instance/size.st 0.00% 5 Missing :warning:
...kage/ZnServer.extension/instance/streamUploads..st 0.00% 5 Missing :warning:
...re.package/WAExternalFile.class/instance/exists.st 0.00% 4 Missing :warning:
... and 11 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1196      +/-   ##
==========================================
+ Coverage   48.64%   48.92%   +0.27%     
==========================================
  Files        8954     9071     +117     
  Lines       80572    81602    +1030     
==========================================
+ Hits        39192    39920     +728     
- Misses      41380    41682     +302     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Sep 18 '23 13:09 codecov[bot]

@theseion I'm trying to get this finally merged. Sorry for ignoring this PR for soo long.

The ZnSeasideServerAdaptor which is used in both Pharo and GemStone was referencing a class that was in the Pharo-specific package. However, I don't immediately see any reason not to include it in GemStone as well. Therefore I moved all classes to one package. This already fixes the total breakdown of the Zinc adaptor in GemStone.

Next, it seems the over-writes of Zinc are not needed in Pharo. I verified the code and it boils down to identical code. However, including those over-writes in GemStone does break it there, most probably because the Zinc version is older there.

Finally, some more parts will need to be moved to Grease. I already changed the WAExternalFile>>exists to use Grease, but the size method will need a new method on GRPlatform.

jbrichau avatar Dec 17 '23 21:12 jbrichau

I think I'm almost there. The last commits show I need to catch some sleep before finishing this off. There should only be one failing test in GemStone that I will need to investigate.

jbrichau avatar Dec 17 '23 23:12 jbrichau

@theseion Almost there! GemStone 3.7 works: https://github.com/SeasideSt/Seaside/actions/runs/7244577602 The use of FileReference should be changed to a Grease abstraction for the older GemStone versions, but that should not be a problem. I will also have to do some manual tests but it seems we will be able to merge this one after all this time.

jbrichau avatar Dec 18 '23 07:12 jbrichau

Wow! Thanks!

theseion avatar Dec 21 '23 18:12 theseion

@theseion All tests are green now. Looks good to me. What do you think?

jbrichau avatar Dec 23 '23 13:12 jbrichau

LGTM. I'm the author of the PR, so I can't review.

theseion avatar Dec 24 '23 20:12 theseion

Yay! 🎉

theseion avatar Dec 25 '23 11:12 theseion

@theseion I noticed that the Zinc adaptor was doing streaming file uploads when the option was set to false. This was due to boolean mistake that was fixed in https://github.com/SeasideSt/Seaside/commit/6003bc7844d5fd70bea21f5a0773a9de71d99771.

However, I then got the failing test testEncodingFunctionalTest where a field submitted in multipart form was always converted to a WAFile. I fixed it in https://github.com/SeasideSt/Seaside/commit/2734bfd010cd25b5fa7fa0846fc1ef5d3676c7c1 basically by reverting that code to how it was before this PR. Giving you this heads up to check if this change was inadvertent or had a specific purpose.

Oh, and Happy New Year! ;-)

jbrichau avatar Jan 02 '24 22:01 jbrichau

I don't recall the reason for doing it. As far as I recall, it is possible for a file input to be submitted with a file but with an empty file name. In that case, the part should be decoded as a file anyway, which is why I added the #or: check for binary encoding. My reasoning was that if the encoding is binary then it is reasonable to assume that we're dealing with a file.

Looking at the code again now I think we could just make the following change:

"..."
part fileName
    ifNil: [ self codec url decode: part fieldValueString ]
    ifNotNil: [ self convertMultipartFileField: part ] ].
"..."

There's another issue with this code though, as it's now possible to send an unlimited amount of data with multipart/form-data, which would consume all the memory. It would be make sense, IMO, to use streaming for all data above a certain length (e.g., content length > 8MB), and for all chunked transfers (since we can't know the length of those).

theseion avatar Jan 03 '24 08:01 theseion

Addressed the above in https://github.com/SeasideSt/Seaside/commit/cb9f147e3c276d3a2135c25e8952f402933bc64a

jbrichau avatar Jan 25 '24 18:01 jbrichau