libkiwix icon indicating copy to clipboard operation
libkiwix copied to clipboard

Kiwix-Serve does not support Multipart-range HTTP requests

Open kelson42 opened this issue 2 years ago • 8 comments
trafficstars

If such a request is done with latest version 3.4.0, then an error is returned:

$ curl https://library.kiwix.org/content/micmaths_fr_all_2022-10/videos/IbV0UoXXcOY/video.webm -i -H "Range: bytes=0-50, 10-150"
HTTP/2 416 
date: Wed, 07 Dec 2022 16:50:55 GMT
content-type: video/webm
content-length: 0
access-control-allow-origin: *
etag: "da17b3bc-69ba-bbf3-5b9d-e34363056d44/Z"
cache-control: max-age=3600, must-revalidate
x-varnish: 4722013 4394821
age: 12885
via: 1.1 varnish (Varnish/7.1)
accept-ranges: bytes
content-range: bytes */63655850
strict-transport-security: max-age=15724800; includeSubDomains

Unfortunately, the analysis of library.kiwix.org logs has shown that we have legitimate clients (Chrome on Android) which generate such kind of requests.

Therefore, this part of the specification like explained here should be supported: https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests#multipart_ranges

Kind of follow-up of #363

kelson42 avatar Dec 07 '22 16:12 kelson42

This limitation was documented in #360:

This PR enables handling of partial content requests with a single byte-range. Requests for two or more byte ranges (even if they effectively constitute a single continuous range) are rejected with a 416 (Range Not Satisfiable) error response. Such behaviour complies with somewhat liberal interpretation of the spec):

The 416 (Range Not Satisfiable) status code indicates that none of the ranges in the request's Range header field (Section 3.1) overlap the current extent of the selected resource or that the set of ranges requested has been rejected due to invalid ranges or an excessive request of small or overlapping ranges.

@kelson42 Do you know what happens in response to our 416 response to such a multi-part range request?

  1. Does the client come back with a set of new separate single-range requests?
  2. Or it then requests the entire item instead?

I believe that for scenario 1, we shouldn't waste any effort implementing this enhancement.

veloman-yunkan avatar Dec 08 '22 13:12 veloman-yunkan

I don't know how browsers react to this, probably they just stop because the spec is not fully implemented, which is not an allowed scenario (either you support byte ranges or not).

kelson42 avatar Dec 08 '22 13:12 kelson42

I don't know how browsers react to this, probably they just stop because the spec is not fully implemented, which is not an allowed scenario (either you support byte ranges or not).

@kelson42 Can't we find out a fact based answer from library.kiwix.org logs?

veloman-yunkan avatar Dec 16 '22 11:12 veloman-yunkan

@rgaudin ?

kelson42 avatar Dec 16 '22 13:12 kelson42

That sounds difficult but the 416 requests were:

library.kiwix.org 18.212.255.64 - - [13/Nov/2022:15:33:25 +0000] "GET http://library.kiwix.org/catalog/v2/categories HTTP/1.1" 416 0 "-" "Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)"
library.kiwix.org xxx.xxx.xxx.xxx - - [15/Nov/2022:17:26:50 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org 37.120.157.86 - - [16/Nov/2022:06:20:18 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org 37.120.157.86 - - [16/Nov/2022:06:21:27 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"

Now I have about 800 lines of logs spread across 2 IPs. I removed the IPs and there is apparently no suggest nor content search request so I guess it's fine to share here.

416-user2.log 416-user.log

rgaudin avatar Dec 16 '22 14:12 rgaudin

416-user2.log Doesn't contain any 416 responses.

Looking at 416-user.log, I see that a request to http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 is first satisfied with a 200 status code and a couple of seconds later another request for the same URL is rejected with a 416 status code:

library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:20:16 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 200 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:20:18 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"

This pattern repeats another time:

library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:21:25 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 200 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:21:27 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"

However, it is not clear if for the 416 responses we deal with multi-part range requests (it could rather be, for example, an out-of-bounds single-range request). Yet it is strange that a web client sends a range request for an illustration resource.

@kelson42 What made you think that the 416 responses from library.kiwix.org are caused by multi-part range requests?

veloman-yunkan avatar Dec 17 '22 14:12 veloman-yunkan

@kelson42 What made you think that the 416 responses from library.kiwix.org are caused by multi-part range requests?

Concretly nothing I can remember, but what would be another plausible scenario?

kelson42 avatar Jan 31 '24 19:01 kelson42

Interesting reading https://www.zeng.dev/post/2023-http-range-and-play-mp4-in-browser/

kelson42 avatar Jan 31 '24 19:01 kelson42