libkiwix
libkiwix copied to clipboard
Kiwix-Serve does not support Multipart-range HTTP requests
If such a request is done with latest version 3.4.0, then an error is returned:
$ curl https://library.kiwix.org/content/micmaths_fr_all_2022-10/videos/IbV0UoXXcOY/video.webm -i -H "Range: bytes=0-50, 10-150"
HTTP/2 416
date: Wed, 07 Dec 2022 16:50:55 GMT
content-type: video/webm
content-length: 0
access-control-allow-origin: *
etag: "da17b3bc-69ba-bbf3-5b9d-e34363056d44/Z"
cache-control: max-age=3600, must-revalidate
x-varnish: 4722013 4394821
age: 12885
via: 1.1 varnish (Varnish/7.1)
accept-ranges: bytes
content-range: bytes */63655850
strict-transport-security: max-age=15724800; includeSubDomains
Unfortunately, the analysis of library.kiwix.org logs has shown that we have legitimate clients (Chrome on Android) which generate such kind of requests.
Therefore, this part of the specification like explained here should be supported: https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests#multipart_ranges
Kind of follow-up of #363
This limitation was documented in #360:
This PR enables handling of partial content requests with a single byte-range. Requests for two or more byte ranges (even if they effectively constitute a single continuous range) are rejected with a 416 (Range Not Satisfiable) error response. Such behaviour complies with somewhat liberal interpretation of the spec):
The 416 (Range Not Satisfiable) status code indicates that none of the ranges in the request's Range header field (Section 3.1) overlap the current extent of the selected resource or that the set of ranges requested has been rejected due to invalid ranges or an excessive request of small or overlapping ranges.
@kelson42 Do you know what happens in response to our 416 response to such a multi-part range request?
- Does the client come back with a set of new separate single-range requests?
- Or it then requests the entire item instead?
I believe that for scenario 1, we shouldn't waste any effort implementing this enhancement.
I don't know how browsers react to this, probably they just stop because the spec is not fully implemented, which is not an allowed scenario (either you support byte ranges or not).
I don't know how browsers react to this, probably they just stop because the spec is not fully implemented, which is not an allowed scenario (either you support byte ranges or not).
@kelson42 Can't we find out a fact based answer from library.kiwix.org logs?
@rgaudin ?
That sounds difficult but the 416 requests were:
library.kiwix.org 18.212.255.64 - - [13/Nov/2022:15:33:25 +0000] "GET http://library.kiwix.org/catalog/v2/categories HTTP/1.1" 416 0 "-" "Slackbot-LinkExpanding 1.0 (+https://api.slack.com/robots)"
library.kiwix.org xxx.xxx.xxx.xxx - - [15/Nov/2022:17:26:50 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org 37.120.157.86 - - [16/Nov/2022:06:20:18 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org 37.120.157.86 - - [16/Nov/2022:06:21:27 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
Now I have about 800 lines of logs spread across 2 IPs. I removed the IPs and there is apparently no suggest nor content search request so I guess it's fine to share here.
416-user2.log Doesn't contain any 416 responses.
Looking at 416-user.log, I see that a request to http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 is first satisfied with a 200 status code and a couple of seconds later another request for the same URL is rejected with a 416 status code:
library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:20:16 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 200 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:20:18 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
This pattern repeats another time:
library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:21:25 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 200 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
library.kiwix.org xxx.xxx.xxx.aaa - - [16/Nov/2022:06:21:27 +0000] "GET http://library.kiwix.org/catalog/v2/illustration/armypubs_en_all_2022-06/?size=48 HTTP/1.1" 416 0 "https://library.kiwix.org/?lang=eng&q=army+publishing" "Mozilla/5.0 (Linux; Android 12; SM-M315F Build/SP1A.210812.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/107.0.5304.105 Mobile Safari/537.36"
However, it is not clear if for the 416 responses we deal with multi-part range requests (it could rather be, for example, an out-of-bounds single-range request). Yet it is strange that a web client sends a range request for an illustration resource.
@kelson42 What made you think that the 416 responses from library.kiwix.org are caused by multi-part range requests?
@kelson42 What made you think that the 416 responses from library.kiwix.org are caused by multi-part range requests?
Concretly nothing I can remember, but what would be another plausible scenario?
Interesting reading https://www.zeng.dev/post/2023-http-range-and-play-mp4-in-browser/