zsync2 icon indicating copy to clipboard operation
zsync2 copied to clipboard

zsync2 does not support downloading large files.

Open ghuls opened this issue 6 years ago • 7 comments

zsync2 does not support downloading large files. failed to parse content-range headerError while parsing headersOther error? -1

I patched zsync2 so it shows to and from values:

$ git diff -U10
diff --git a/src/legacy_http.c b/src/legacy_http.c
index 41310da..290603e 100644
--- a/src/legacy_http.c
+++ b/src/legacy_http.c
@@ -626,20 +626,21 @@ int range_fetch_read_http_headers(struct range_fetch *rf) {
             p[len] = 0;
         }
         /* buf is the header name (lower-cased), p the value */
         /* Switch based on header */
 
         if (status == 206 && !strcmp(buf, "content-range")) {
             /* Okay, we're getting a non-MIME block from the remote. Get the
              * range and set our state appropriately */
             int from, to;
             sscanf(p, "bytes " OFF_T_PF "-" OFF_T_PF "/", &from, &to);
+            fprintf(stderr, "content-range from: %d  to: %d\n", from, to);
             if (from <= to) {
                 rf->block_left = to + 1 - from;
                 rf->offset = from;
             } else {
                 fprintf(stderr, "failed to parse content-range header");
             }
 
             /* Can only have got one range. */
             rf->rangesdone++;
             rf->rangessent = rf->rangesdone;
$ ./zsync2 -v https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather.zsync
zsync2 version 2.0.0-alpha-1 (commit 7857ff1), build <local dev build> built on 2018-12-21 09:58:27 UTC
Checking for changes...
Cannot find file hg19-regions-9species.all_regions.mc8nr.feather, triggering full download
/ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather.part found, using as seed file
Target file: /ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather
Reading seed file: /ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather.part
Usable data from seed files: 0.000000%
Renaming temp file
Fetching remaining blocks
Downloading from https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather

-------------------- 0.0%* Hostname was NOT found in DNS cache
*   Trying 134.58.50.8...
* Adding handle: conn: 0x1654a00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 3 (0x1654a00) send_pipe: 1, recv_pipe: 0
* Connected to resources.aertslab.org (134.58.50.8) port 443 (#3)
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using ECDHE-RSA-AES256-GCM-SHA384
* Server certificate:
* 	 subject: CN=resources.aertslab.org
* 	 start date: 2018-11-25 04:49:48 GMT
* 	 expire date: 2019-02-23 04:49:48 GMT
* 	 subjectAltName: resources.aertslab.org matched
* 	 issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* 	 SSL certificate verify ok.
> GET /cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather HTTP/1.1
Range: bytes=0-3475369983
Host: resources.aertslab.org
Accept: */*

< HTTP/1.1 206 Partial Content
< Date: Fri, 21 Dec 2018 10:09:34 GMT
* Server Apache/2.4.37 (Ubuntu) is not blacklisted
< Server: Apache/2.4.37 (Ubuntu)
< Strict-Transport-Security: max-age=15768000
< Last-Modified: Wed, 23 May 2018 07:38:22 GMT
< ETag: "16cf25e760-56cda9e8f304e"
< Accept-Ranges: bytes
< Content-Length: 3475369984
< Content-Range: bytes 0-3475369983/97964648288
< 
content-range from: 0  to: -819597313
failed to parse content-range headerError while parsing headersOther error? -1
-1 returned
-------------------- 0.0% 0.0 kBps aborted    

* Closing connection 3
failed to retrieve from hg19-regions-9species.all_regions.mc8nr.feather, status -1

As you can see int (signed int) is not big enough, from and to should be uint (unsigned int) (at least 32 bits).

ghuls avatar Dec 21 '18 10:12 ghuls

Thanks @ghuls. Could you please send a PR that includes

  • The added verbosity
  • Using uint

Again, thank you very much.

probonopd avatar Dec 21 '18 11:12 probonopd

@probonopd Adding this change is not enough.

It seems that there are a lot of issues with files bigger than 2GiB:

$ ./zsync2 -v https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather.zsync
zsync2 version 2.0.0-alpha-1 (commit 7857ff1), build <local dev build> built on 2018-12-21 12:30:07 UTC
Checking for changes...
Cannot find file hg19-regions-9species.all_regions.mc8nr.feather, triggering full download
/ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather.part found, using as seed file
Target file: /ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather
Reading seed file: /ddn1/vol1/site_scratch/leuven/303/vsc30366/hg19-regions-9species.all_regions.mc8nr.feather.part
Usable data from seed files: 3.547576%
Renaming temp file
Fetching remaining blocks
Downloading from https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather

-------------------- 3.5%* Hostname was NOT found in DNS cache
*   Trying 134.58.50.8...
* Adding handle: conn: 0x16f74f0
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 3 (0x16f74f0) send_pipe: 1, recv_pipe: 0
* Connected to resources.aertslab.org (134.58.50.8) port 443 (#3)
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using ECDHE-RSA-AES256-GCM-SHA384
* Server certificate:
* 	 subject: CN=resources.aertslab.org
* 	 start date: 2018-11-25 04:49:48 GMT
* 	 expire date: 2019-02-23 04:49:48 GMT
* 	 subjectAltName: resources.aertslab.org matched
* 	 issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* 	 SSL certificate verify ok.
> GET /cistarget/databases/homo_sapiens/hg19/refseq_r45/mc8nr/region_based/hg19-regions-9species.all_regions.mc8nr.feather HTTP/1.1
Range: bytes=3475369984-3475369983
Host: resources.aertslab.org
Accept: */*

< HTTP/1.1 200 OK
< Date: Fri, 21 Dec 2018 12:36:54 GMT
* Server Apache/2.4.37 (Ubuntu) is not blacklisted
< Server: Apache/2.4.37 (Ubuntu)
< Strict-Transport-Security: max-age=15768000
< Last-Modified: Wed, 23 May 2018 07:38:22 GMT
< ETag: "16cf25e760-56cda9e8f304e"
< Accept-Ranges: bytes
< Content-Length: 97964648288
< 

zsync received a data response (code 200) but this is not a partial content response
zsync can only work with servers that support returning partial content from files. The person/entity creating this .zsync has tried to use a server that is not returning partial content. zsync cannot be used with this server.
See http://zsync.moria.orc.uk/server-issues
Other error? -1
-1 returned
-------------------- 3.5% 0.0 kBps aborted    

* Closing connection 3
failed to retrieve from hg19-regions-9species.all_regions.mc8nr.feather, status -1

It seems to request an invalid byte range: Range: bytes=3475369984-3475369983

Tested with this changes:

$ git diff
diff --git a/src/legacy_http.c b/src/legacy_http.c
index 41310da..ccf4f06 100644
--- a/src/legacy_http.c
+++ b/src/legacy_http.c
@@ -53,8 +53,8 @@ struct http_file
     } handle;
 
     char *buffer;
-    size_t buffer_len;
-    size_t buffer_pos;
+    off_t buffer_len;
+    off_t buffer_pos;
     int still_running;
 };
 
@@ -391,9 +391,9 @@ static int fill_buffer(HTTP_FILE *file, size_t want, CURLM* multi_handle)
  *
  * Removes `want` bytes from the front of the buffer.
  */
-static int use_buffer(HTTP_FILE *file, int want)
+static off_t use_buffer(HTTP_FILE *file, off_t want)
 {
-    if((file->buffer_pos - want) <= 0){
+    if(file->buffer_pos <= want){
         /* trash the buffer */
         if(file->buffer){
             free(file->buffer);
@@ -416,7 +416,7 @@ static int use_buffer(HTTP_FILE *file, int want)
  */
 size_t http_fread(void *ptr, size_t size, size_t nmemb, HTTP_FILE *file, struct range_fetch *rf)
 {
-    size_t want;
+    off_t want;
     want = nmemb * size;
     fill_buffer(file, want, rf->multi_handle);
 
@@ -560,14 +560,14 @@ static void buflwr(char *s) {
 int range_fetch_read_http_headers(struct range_fetch *rf) {
     char buf[512];
     int status;
-    int seen_location = 0;
+   uint seen_location = 0;
 
     {                           /* read status line */
         char *p;
 
         if (rfgets(buf, sizeof(buf), rf) == NULL){
             /* most likely unexpected EOF from server */
-            fprintf(stderr, "EOF from server");
+            fprintf(stderr, "EOF from server\n");
             return -1;
         }
         if (buf[0] == 0)
@@ -622,7 +622,7 @@ int range_fetch_read_http_headers(struct range_fetch *rf) {
         p += 2;
         buflwr(buf);
         {   /* Remove the trailing \r\n from the value */
-            int len = strcspn(p, "\r\n");
+            uint len = strcspn(p, "\r\n");
             p[len] = 0;
         }
         /* buf is the header name (lower-cased), p the value */
@@ -631,13 +631,14 @@ int range_fetch_read_http_headers(struct range_fetch *rf) {
         if (status == 206 && !strcmp(buf, "content-range")) {
             /* Okay, we're getting a non-MIME block from the remote. Get the
              * range and set our state appropriately */
-            int from, to;
+            off_t from, to;
             sscanf(p, "bytes " OFF_T_PF "-" OFF_T_PF "/", &from, &to);
+            fprintf(stderr, "content-range from: %d  to: %d\n", from, to);
             if (from <= to) {
                 rf->block_left = to + 1 - from;
                 rf->offset = from;
             } else {
-                fprintf(stderr, "failed to parse content-range header");
+                fprintf(stderr, "failed to parse content-range header\n");
             }
 
             /* Can only have got one range. */
@@ -678,7 +679,7 @@ int range_fetch_read_http_headers(struct range_fetch *rf) {
          */
     }
 
-    fprintf(stderr, "Error while parsing headers");
+    fprintf(stderr, "Error while parsing headers\n");
     return -1;
 }
 
diff --git a/src/zsclient.cpp b/src/zsclient.cpp
index 06a993b..c5fd3f0 100644
--- a/src/zsclient.cpp
+++ b/src/zsclient.cpp
@@ -269,12 +269,14 @@ namespace zsync2 {
 
                 // if interested in headers only, download 1 kiB chunks until end of zsync header is found
                 if (headersOnly) {
-                    static const auto chunkSize = 1024;
-                    unsigned long currentChunk = 0;
+issueStatusMessage("headersOnly");
+                    static const off_t chunkSize = 1024;
+                    off_t currentChunk = 0;
 
                     // download a chunk at a time
                     while (true) {
                         std::ostringstream bytes;
+issueStatusMessage("headersOnly:" + std::to_string(currentChunk) + " " + std::to_string( chunkSize) + " " + std::to_string(currentChunk + chunkSize - 1) + "\n");
                         bytes << "bytes=" << currentChunk << "-" << currentChunk + chunkSize - 1;
                         session.SetHeader(cpr::Header{{"range", bytes.str()}});
 

ghuls avatar Dec 21 '18 12:12 ghuls

It'll be much more easy to review if you send a PR right away.

TheAssassin avatar Dec 21 '18 14:12 TheAssassin

I think he is not sending a PR since despite his changes it is not working yet.

probonopd avatar Dec 21 '18 14:12 probonopd

Hmm, applying this diff file (manually, thanks a lot git apply for never working) makes it work on the Garuda Linux ISO file I tested this on: https://builds.garudalinux.org/iso/garuda/dr460nized/210324/garuda-dr460nized-linux-zen-210324.iso.zsync

JustTNE avatar Mar 24 '21 11:03 JustTNE

I've noticed while compiling this in cygwin that this is sometimes wrong and uses 32 bit stuff instead: https://github.com/AppImage/zsync2/blob/86cfd3a1d6a27483ec40edd62c1a6bd409cbbe5d/src/format_string.h#L24-L36

Forcing it to use 64 bit stuff fixed any issues I had on the cygwin compiled version.

JustTNE avatar Mar 25 '21 16:03 JustTNE

This patch goes in the right direction, but it actually doesn't solve the issue. See my comments in #59. A fix must use fixed 64-bit types. size_t and off_t are compiler-dependent and typically just 32-bit in size on 32-bit machines.

TheAssassin avatar Apr 02 '21 15:04 TheAssassin