bob icon indicating copy to clipboard operation
bob copied to clipboard

Implement retries for URL SCM downloads

Open mahaase opened this issue 4 years ago • 5 comments

Hey,

this problem I have since the very beginning:

+ '[' -e .state/sandbox_ubuntu-x86_64_src_1-f301-98ca.canary ']'
+ echo f63302148dced4ef2ec67d869b958dd99da61e1b
+ bob _invoke sandbox_ubuntu-x86_64_src_1-f301-98ca.spec.json -vv
+ <wget> http://fdt-c-pcs-0004.fdtech.intern/fileserver/ubuntu-bionic-amd64-rootfs.tgz > ./ubuntu-bionic-amd64-rootfs.tgz
Error: Response too short: 2147479552 < 2797978679 (bytes)
Error: Recipe sandbox::ubuntu-x86_64 in checkoutSCM: dir:., url:http://fdt-c-pcs-0004.fdtech.intern/fileserver/ubuntu-bionic-amd64-rootfs.tgz in checkoutSCM: dir:., url:http://fdt-c-pcs-0004.fdtech.intern/fileserver/ubuntu-bionic-amd64-rootfs.tgz failed
Build step 'Execute shell' marked build as failure
Archiving artifacts
Finished: FAILURE

I tried to configure the nginx fileserver with some stuff, but no success at all. The archives, which sometimes fails, are >1,5 GB.

If I restart the job, in most cases it will work than..

calling curl -sSg --fail -o sandbox_ubuntu-x86_64_dist_1-68ac-e703.tgz http://fdt-c-pcs-0004.fdtech.intern/artifact/f1/bb/21909256750d2d6038bbf148e3c7ce8fd9de-1.tgz manually will work everytime...

Any ideas?

BR.

mahaase avatar May 12 '20 09:05 mahaase

Any chance that the file is downloaded or served from a 32-bit Linux installation? The file length suspiciously looks like the 2 GiB barrier: exactly 2GiB - 4kiB!

What do you mean by "since the very beginning"? That code was refactored from calling curl to using Python. Does the problem exist since the rewrite or was it already there when curl was still used?

That put aside I'll look into how we could gracefully handle the error...

jkloetzke avatar May 12 '20 19:05 jkloetzke

No there are no 32-bit machines in company. "since the very beginning" means, since I use large sandboxes, also before with curl!

Maybe the nginx webserver is not the best solution for that, maybe I should try another one, or switch to different protocol like ftp? could bob handle ftp?

daxcore avatar May 12 '20 20:05 daxcore

I've looked a bit deeper into that and it appears this may be related to the sendfile system call that nginx is using. Any change that sendfile on; is in your config? Other seem to experience similar problems, e.g.: https://forum.nginx.org/read.php?2,260186

If you use sendfile on then does adding sendfile_max_chunk 512k; help?

Bob does support ftp but this is untested. I wouldn't recommend going back to the 80s... :wink:

jkloetzke avatar May 12 '20 20:05 jkloetzke

Hey, maybe

sendfile_max_chunk 512k;

was already the solution, I will watch this some more time. If u don't want to change the bob here, we could close the ticket. If the issues occurs again, I would reopen it.

mahaase avatar May 13 '20 16:05 mahaase

I think we can leave it open to act as a reminder that we should implement some kind of retry. There can always be a hiccup in the connection. But it certainly is low prio...

jkloetzke avatar May 13 '20 17:05 jkloetzke

Fixed by #517

jkloetzke avatar Jun 20 '23 18:06 jkloetzke