opendal icon indicating copy to clipboard operation
opendal copied to clipboard

bug: Handle range request with incomplete response

Open aoli-al opened this issue 9 months ago • 4 comments

Describe the bug

Per: https://issues.chromium.org/issues/390229583 returning 206 with incomplete content is expected. OpenDAL may need to handle this corner case.

Or please also respond to the chromium issue to provide more context.

Steps to Reproduce

Related: https://github.com/XiangpengHao/parquet-viewer/issues/7

Expected Behavior

OpenDAL should not crash when receiving 206 with an incomplete response.

Additional Context

Currently, OpenDAL fails at this line.

https://github.com/apache/opendal/blob/0c44e07c49f65a10cc4f6d56c377e851abb34876/core/src/raw/http_util/body.rs#L89

Are you willing to submit a PR to fix this bug?

  • [ ] Yes, I would like to submit a PR.

aoli-al avatar Mar 03 '25 15:03 aoli-al

Hi @aoli-al, thanks so much for your bug report! I really enjoyed reading through https://aoli.al/blogs/chrome-bug/.

OpenDAL checks the body size against the Content-Length, and I thought 206 Partial Content should return the correct Content-Length. It simply includes a Content-Range header specifying the range of the returned content.

I tested it locally and find that the server does return correct content-length:

:) curl 'http://localhost:3000/gridwatch_2023-01-08.parquet' -H 'Range: bytes=4-138724' -v
* Host localhost:3000 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:3000...
* Connected to localhost (::1) port 3000
* using HTTP/1.x
> GET /gridwatch_2023-01-08.parquet HTTP/1.1
> Host: localhost:3000
> Accept: */*
> Accept-Language: en-US,en;q=0.9
> Connection: keep-alive
> DNT: 1
> Origin: https://xuanwo.io
> Range: bytes=4-138724
> Sec-Fetch-Dest: empty
> Sec-Fetch-Mode: cors
> Sec-Fetch-Site: cross-site
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36 Edg/133.0.0.0
> sec-ch-ua: "Not(A:Brand";v="99", "Microsoft Edge";v="133", "Chromium";v="133"
> sec-ch-ua-mobile: ?0
> sec-ch-ua-platform: "Linux"
>
* Request completely sent off
< HTTP/1.1 206 Partial Content
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: *
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Private-Network: true
< Content-Length: 138721
< Content-Disposition: inline; filename="gridwatch_2023-01-08.parquet"
< Accept-Ranges: bytes
< ETag: "9fd6c7db3d0a71c6bfedd73034904a4ff3c0d3dc"
< Content-Type: binary/octet-stream
< Last-Modified: Wed, 10 Jan 2025 12:00:00 GMT
< access-control-allow-methods: GET,HEAD
< Vary: Origin, Access-Control-Request-Headers, Access-Control-Request-Method
< Content-Range: bytes 4-138724/29300729
< Date: Tue, 04 Mar 2025 07:00:58 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
Warning: Binary output can mess up your terminal. Use "--output -" to tell curl to
Warning: output it to your terminal anyway, or consider "--output <FILE>" to save to a
Warning: file.
* client returned ERROR on write of 16384 bytes
* closing connection #0

Maybe I still haven't fully understood the problem. Does chrome provide less data than the declared Content-Length?


By the way, opendal handle's 206 in this way:

https://github.com/apache/opendal/blob/6833a104800e3168f033f366ee765fc5d6299b53/core/src/services/s3/backend.rs#L1054-L1068

Xuanwo avatar Mar 04 '25 07:03 Xuanwo

You may find more discussion here. Sorry, I'm not familiar with web technologies.

Based on my understanding, when the server returns 206, it is legal for the server to return partial content no matter what content-length is set in the header. For example, if the client request ranges from 1-256, the server may respond 1-128 bytes but still with content-length set to 256.

So, a proper way to handle this is to

  1. send range request
  2. get 206 responses and check the data length
  3. if the data length is shorter than requested, issue another range request and return to 1. Otherwise, break.

Also, this is based on my observation of Chrome/Chromium. I don't know if other browsers behave the same way.

aoli-al avatar Mar 04 '25 14:03 aoli-al

Based on my understanding, when the server returns 206, it is legal for the server to return partial content no matter what content-length is set in the header. For example, if the client request ranges from 1-256, the server may respond 1-128 bytes but still with content-length set to 256.

Ohhhh, that's really surprising. Maybe something happened inside WASM (a.k.a. fetch), since we don't have our own network stack in the browser. I'll take a closer look at this.

send range request get 206 responses and check the data length if the data length is shorter than requested, issue another range request and return to 1. Otherwise, break.

However, this could introduce more issues since bad servers can return bad responses, which happens quite frequently. We need to take that into account.

Xuanwo avatar Mar 04 '25 14:03 Xuanwo

We seems to have three problems here:

  • chrome meets 403 while trying to fill the cache, we need to find out why sig not match for the second request.
  • chrome silently convert 403 to 206 response which chrome team think it's fine.
  • chrome set wrong content-length for the following 206 response which lead to opendal raise error "too little data"

Have I concluded correctly? @aoli-al

Xuanwo avatar Mar 04 '25 14:03 Xuanwo