openqa-worker-cacheservice-minion.service - Download errors with big files on some aarch64
I have setup an armv9 machine, running regular armv8 Tumbleweed, for openQA to test armv9 rebuild of Tumbleweed. See: https://openqa.opensuse.org/admin/workers/1492
openqa-worker-cacheservice-minion.service works fine with a small NET ISO file: https://openqa.opensuse.org/tests/5476200
But it breaks with a bigger JeOS xz file: https://openqa.opensuse.org/tests/5476197
Log from openqa-worker-cacheservice-minion.service:
Nov 24 08:46:15 MS-R1 openqa-worker-cacheservice-minion[9104]: [9104] [i] Worker 9104 started
Nov 24 08:46:16 MS-R1 openqa-worker-cacheservice-minion[575743]: [575743] [i] Downloading: "openSUSE-Tumbleweed-NET-aarch64-Build2509.1-Media.iso"
Nov 24 08:46:53 MS-R1 openqa-worker-cacheservice-minion[575743]: [575743] [i] Cache size of "/var/lib/openqa/cache" is 0 Byte, with limit 50 GiB
Nov 24 08:46:53 MS-R1 openqa-worker-cacheservice-minion[575743]: [575743] [i] Downloading "openSUSE-Tumbleweed-NET-aarch64-Build2509.1-Media.iso" from "https://openqa.opensuse.org/tests/5476200/asset/iso/openSUSE-Tumbleweed-NET-aarch64-Build2509.1-Media.iso"
Nov 24 09:26:46 MS-R1 openqa-worker-cacheservice-minion[580912]: [580912] [i] Downloading: "openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz"
Nov 24 09:28:25 MS-R1 openqa-worker-cacheservice-minion[580912]: [580912] [i] Cache size of "/var/lib/openqa/cache" is 392 MiB, with limit 50 GiB
Nov 24 09:28:25 MS-R1 openqa-worker-cacheservice-minion[580912]: [580912] [i] Downloading "openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz" from "https://openqa.opensuse.org/tests/5476197/asset/hdd/openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz"
Nov 24 09:30:15 MS-R1 openqa-worker-cacheservice-minion[580912]: [580912] [i] Size of "/var/lib/openqa/cache/openqa.opensuse.org/openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz" differs, expected 1.1 GiB but downloaded 1 GiB
Nov 24 09:30:15 MS-R1 openqa-worker-cacheservice-minion[580912]: [580912] [i] Download error, waiting 5 seconds for next try (4 remaining)
Nov 24 09:30:15 MS-R1 openqa-worker-cacheservice-minion[580912]: [580912] [i] Downloading "openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz" from "https://openqa.opensuse.org/tests/5476197/asset/hdd/openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz"
The files are truncated but not always at the same offset. Truncated files:
/var/lib/openqa/cache/tmp:
total 3194512
-rw-------. 1 _openqa-worker _openqa-worker 1093070564 Nov 24 09:35 mojo.tmp.4qhsbzpDCp4PZpCx
-rw-------. 1 _openqa-worker _openqa-worker 1088999336 Nov 24 09:45 mojo.tmp.bKtNAJSoAefHQ5iZ
-rw-------. 1 _openqa-worker _openqa-worker 1089106792 Nov 24 09:58 mojo.tmp.JtyBL3pCMB1uMuPA
Manually downloaded file:
1142030516 Nov 24 08:46 openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz
A diff confirms we just miss the end of the file.
Any idea where this problem could come from?
At a certain size the Mojolicious asset handling creates these temporary files on disk (instead of handling the download in-memory). Maybe we're doing something wrong here or there's a bug in Mojolicous. I'm wondering why we don't see this on other workers, though. So maybe just a problem with your file system (in combination with bad error handling of the cache service)? Maybe @kraih has a more concrete idea.
That does look suspiciously close to 1GB. There is no default limit in Mojolicious with that value though. If it is a limit configured in openQA and enforced by Mojolicious, the slightly different sizes would still make sense. Because the HTTP messages could just be read in slightly different chunk sizes (different HTTP headers and so on), making the limit trigger at different offsets. I would start by searching the openQA code for 1GB/1GiB settings.
I performed more tests.
With curl, I can reproduce a similar error (on the same machine running the openQA worker): curl -L -O https://openqa.opensuse.org/tests/5476824/asset/hdd/openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz returns:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
95 1089M 95 1039M 0 0 6486k 0 0:02:51 0:02:44 0:00:07 4870k
curl: (92) HTTP/2 stream 3 was not closed cleanly: INTERNAL_ERROR (err 2)
It is reproducible with a slightly different offset each time.
Also, on another aarch64 machine, on a different network, curl -L -O https://openqa.opensuse.org/tests/5476824/asset/hdd/openSUSE-Tumbleweed-ARM-JeOS-efi.aarch64-2025.11.09-Build5.3.raw.xz returns:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
95 1089M 95 1041M 0 0 4338k 0 0:04:17 0:04:05 0:00:12 4232k
curl: (92) HTTP/2 stream 3 was not closed cleanly: INTERNAL_ERROR (err 2)
Again, it is reproducible with a slightly different offset each time.
wget works just fine on both machines.
Googling this error, one suggestion is to set proxy_buffering to false in nginx https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_buffering
Is it something we could check/try for o3?
I guess we could check/try this. Out of my head I would say that these file aren't served via the Mojolicoius app (and therefore don't use reverse proxy settings) anyway, though.
The difference between wget and curl seems to be a resume on error, since with a verbose wget, I can see:
2025-11-24 16:00:24 (3.26 MB/s) - Connection closed at byte 1091244964. Retrying.
So, the connection has been closed with wget as well, but wget is able to resume just fine from there and download the last ~50 MB remaining.
curl does not, even with --retry option.
Maybe there is an option to allow resumes on errors in Mojo UserAgent?
Maybe - and if not this should be fairly simple to implement.
Resumable downloads are not that simple, but Mojolicious has everything required built-in already. For a user friendly API see Mojo::File::download.
Resumable downloads are not that simple, but Mojolicious has everything required built-in already. For a user friendly API see Mojo::File::download.
As a workaround I hot patched _get in lib/OpenQA/Downloader.pm on the armv9 worker:
sub _get ($self, $url, $target, $options) {
my $ua = $self->ua;
my $log = $self->log;
my $file = path($target)->basename;
$log->info(qq{Downloading "$file" from "$url"});
my $path = Mojo::File->new($target);
my $ret;
my $err;
my $remaining_attempts = $self->attempts;
while (1) {
if (--$remaining_attempts) {
$ret = $path->download($url, $ua);
$err = undef;
if (!$ret) {
$err = (qq{"path->download" failed! Retrying ($remaining_attempts remaining attempts) });
$log->info($err);
next;
}
else {
$log->info( qq{"path->download" OK!} );
last;
}
}
else {
last;
}
}
return ($ret, $err);
}
It cannot be merged as is since there are some missing features (auto extract and etag have been dropped) + cleanup needed.
Note: Incomplete downloads are resumed, but $path->download needs to be called again, it is not retried automatically.