mod_zip icon indicating copy to clipboard operation
mod_zip copied to clipboard

CRC32 calculation for upstream subrequests

Open mdineen opened this issue 2 years ago • 9 comments

Adding files from block storage with - as checksum causes the module to calculate the crc-32. Adding the same file from an upstream results in 000000 for the checksum, which causes validation errors in some Windows file browsers.

Can we have the module calculate the crc-32 for upstream files?

mdineen avatar Jun 01 '22 15:06 mdineen

Solved by updating https://github.com/evanmiller/mod_zip/blob/808fb55e7235a201ea862e02dab612b87787d5a4/ngx_http_zip_module.c#L366

366 -    sr_ctx = ngx_http_get_module_ctx(r, ngx_http_zip_module);
366 +    sr_ctx = ngx_http_get_module_ctx(r->main, ngx_http_zip_module);

I'll send a pr

mdineen avatar Jun 01 '22 18:06 mdineen

The fix in #91 didn't work. By using s3fs the module calculates the crc32 properly.

mdineen avatar Jun 03 '22 21:06 mdineen

The subrequest context is supposed to be set here:

https://github.com/evanmiller/mod_zip/blob/808fb55e7235a201ea862e02dab612b87787d5a4/ngx_http_zip_module.c#L570-L572

I wonder if that's not happening for some reason.

evanmiller avatar Jun 04 '22 10:06 evanmiller

+1 having same issue

mecampbellsoup avatar Jul 26 '22 16:07 mecampbellsoup

When serving the files going into the zip directly from nginx (and not pre-calculating the crc32) it seems the crc32 does not get calculated correctly. Upon looking into this, it seems it is due to in when the call: https://github.com/evanmiller/mod_zip/blob/5b2604b3914f87db2077f2239b8a98b66cf622af/ngx_http_zip_module.c#L371 happens both p and len values are zero here, which causes the ngx_crc32_update() to run with length 0. https://github.com/evanmiller/mod_zip/blob/5b2604b3914f87db2077f2239b8a98b66cf622af/ngx_http_zip_module.c#L394-L395 The value of sr_ctx does appear to be correct.

Bit more background nginx set up as reverse proxy with upstream serving requests for the zip listings so the upstream will return items such as

- 19 %2Fcontent%2Fa.txt a.txt
- 15 %2Fcontent%2Fb.txt b.txt

Those files are then served by the reverse proxy itself (/contents, is mapped to path on the reverse proxy instance). All the crc32s are correctly identified as missing by mod_zip, however due to the previously mentioned issue they are sent as 0.

If instead of serving the /contents directly with a location section in the reverse proxy, an additional server section is added to the reverse proxy nginx instance /contents is served from that "local upstream" instead then the crc32 values are calculated correctly. Which means the following does not work:

    location /content {
        internal;
        root /mnt/data/content;
    }

However this does:

    location /content {
        internal;
        rewrite ^/content/(.*)$ /$1 break;
        proxy_pass http://127.0.0.1:8081;
    }
...
server {
    listen 8081;

    location /content {
        root /mnt/data/content;
    }
}

arsenetar avatar Jul 26 '22 23:07 arsenetar

The subrequest context is supposed to be set here:

https://github.com/evanmiller/mod_zip/blob/808fb55e7235a201ea862e02dab612b87787d5a4/ngx_http_zip_module.c#L570-L572

I wonder if that's not happening for some reason.

When I debug that out, the pointer is 0000000000000000. If I set up debugs in ngx_http_zip_send_file_piece:

    ngx_http_set_ctx(sr, sr_ctx, ngx_http_zip_module);
    ngx_http_zip_sr_ctx_t *my_ctx;
    my_ctx = ngx_http_get_module_ctx(sr, ngx_http_zip_module);
    ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, "mod_zip: 1 setting sr, sr_ctx, myctx: %p %p %p", sr, sr_ctx, my_ctx);

and ngx_http_zip_subrequest_body_filter

    ngx_http_zip_sr_ctx_t *sr_ctx;
    sr_ctx = ngx_http_get_module_ctx(r, ngx_http_zip_module);
    if (shown == 0) {
        ngx_log_debug2(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, "mod_zip: 3 getting r, sr_ctx : %p %p", r, sr_ctx);
        shown = 1;
    }

I get this output:

2023/03/21 16:45:41 [debug] 17630#0: *1 mod_zip: 1 setting sr, sr_ctx, myctx: 0000000000E9D7D8 0000000000E86518 0000000000E86518
2023/03/21 16:45:41 [debug] 17630#0: *1 mod_zip: 3 getting r, sr_ctx : 0000000000E9D7D8 0000000000000000
2023/03/21 16:45:41 [debug] 17630#0: *1 mod_zip: 1 setting sr, sr_ctx, myctx: 0000000000EA7CF0 0000000000E99618 0000000000E99618
2023/03/21 16:45:42 [debug] 17630#0: *1 mod_zip: 3 getting r, sr_ctx : 0000000000EA7CF0 0000000000000000
2023/03/21 16:45:42 [debug] 17630#0: *1 mod_zip: 1 setting sr, sr_ctx, myctx: 0000000000EAA730 0000000000EA26A8 0000000000EA26A8
2023/03/21 16:45:42 [debug] 17630#0: *1 mod_zip: 3 getting r, sr_ctx : 0000000000EAA730 0000000000000000

Which shows that the request setting and retrieing the context is the same, and that the pointer to the sr_ctx is retrievable right after it's set, but that trying to retrieve the same context in the body filter points to zeros.

To me this suggests the context is being nullified elsewhere, before we get to the body filter.

mdineen avatar Mar 02 '23 00:03 mdineen

When I create sr_ctx with a global variable and set it directly the problem goes away. Something is clearing the module context between the calls to ngx_http_zip_send_file_piece and ngx_http_zip_subrequest_body_filter.

diff --git a/ngx_http_zip_module.c b/ngx_http_zip_module.c
index cbd448e..fcfd4fc 100644
--- a/ngx_http_zip_module.c
+++ b/ngx_http_zip_module.c
@@ -11,6 +11,7 @@
 #include "ngx_http_zip_file.h"
 #include "ngx_http_zip_headers.h"
 
+static ngx_http_zip_sr_ctx_t *global_sr_ctx;
 static ngx_chain_t *ngx_chain_last_link(ngx_chain_t *chain_link);
 static ngx_int_t ngx_http_zip_discard_chain(ngx_http_request_t *r,
         ngx_chain_t *in);
@@ -334,7 +335,7 @@ ngx_http_zip_subrequest_body_filter(ngx_http_request_t *r, ngx_chain_t *in)
 {
     ngx_http_zip_sr_ctx_t *sr_ctx;
 
-    sr_ctx = ngx_http_get_module_ctx(r, ngx_http_zip_module);
+    sr_ctx = global_sr_ctx;
 
     if (in && sr_ctx && sr_ctx->requesting_file->missing_crc32) {
         uint32_t old_crc32 = sr_ctx->requesting_file->crc32;
@@ -542,7 +543,8 @@ ngx_http_zip_send_file_piece(ngx_http_request_t *r, ngx_http_zip_ctx_t *ctx,
 
     sr_ctx->requesting_file = piece->file;
 
-    ngx_http_set_ctx(sr, sr_ctx, ngx_http_zip_module);
+    global_sr_ctx = sr_ctx;
+    
     if (ctx->wait) {
         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                 "mod_zip : only one subrequest may be waited at the same time; ");

~~I don't think this is the right solution, but it does work.~~

This fails when there are >1 requests with subrequests happening, causing all subrequests' chunks to add on to a single running crc32 value.

mdineen avatar Mar 15 '23 21:03 mdineen

I discovered that disabling buffering for files that are to be zipped resolves the incorrect CRC32 checksum issue. To fix this, add proxy_buffering off; to the NGINX location block that serves the files specified in the mod_zip manifest.

dleonov avatar May 09 '24 13:05 dleonov

https://github.com/evanmiller/mod_zip/pull/111 fixes the problem at the source.

mdineen avatar May 09 '24 14:05 mdineen