lakeFS
lakeFS copied to clipboard
lakectl error when uploading files larger than 5 GB recursively
I was trying to recursively upload large volume of data to my repo (s3 blockstore) using lakectl.
I noticed that if files in the directory are lager than 5 GB, then the following command gives an error. It start the uploads but after a couple of seconds it fails. The files are around 15 GB each. This issue is happening everytime when the individual files are larger than 5 GB.
Command:
lakectl fs upload --source . lakefs://repo/branch/directory --recursive
Output:
upload file_1.txt ... fail! [123.37MB in 1.068s]
upload file_3.txt ... fail! [154.34MB in 1.068s]
upload file_2.txt ... fail! [173.11MB in 1.069s]
It also returns an URL with Error 400 Bad Request which is something like this
upload file_3.txt failed: upload request failed
https://<bucket>.s3.us-east-1.amazonaws.com/repo/data/<some token>/
<some token>
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=<xxxx>
&X-Amz-Date=20250505T083730Z
&X-Amz-Expires=900
&X-Amz-Security-Token=<xxxxx>
&X-Amz-SignedHeaders=host
&x-id=PutObject
&X-Amz-Signature=<xxxxxx>: 400 Bad Request
But, when I upload files individually using the following command, the upload suceeds.
for i in 1 2 3; do
lakectl fs upload --source "file_${i}.txt" lakefs://repo/branch/directory
done
Is there anything I am doing wrong or is this is an issue with lakectl?
I observed the same thing on Amazon Linux 2, using the latest version of lakefs when there are at least 2 files in the recursive upload, if even one of them is large (in my case 8 GB).
$ lakectl --version
lakectl version: 1.56.0
lakeFS version: 1.56.0
I ran the recursive upload on a handful of versions to find when the issue was introduced. Here are my results, it appears the issue was introduced in v1.43.0. Its also worth noting that the upload also proceeded on v1.43.0 with --pre-sign=false.
1.56.1 ❌ 1.56.0 ❌ 1.55.0 ❌ 1.51.0 ❌ 1.49.1 ❌ 1.45.0 ❌ 1.43.0 ❌ 1.42.0 ✅ Upload proceeded 1.41.0 ✅ Upload proceeded 1.40.0 ✅ Upload proceeded
I'd like to say the upload worked on 1.42.0, but I'll just say it proceeded beyond where the others failed, and uploaded about half the file before taking a hit from the oom-killer. The larger of the two files being uploaded was 8 GB.
+ lakefs-1.42.0/lakectl --version
lakectl version: 1.42.0
lakeFS version: 1.42.0
lakectl out of date! (Available: 1.56.1)
lakeFS out of date! (Available: 1.56.1)
Get the latest release https://github.com/treeverse/lakeFS/releases
+ lakefs-1.42.0/lakectl fs upload -r -s /home/rgorsuch/TwoFiles lakefs://sandbox/upload-from-hq4/TwoFiles
diff 'local:///home/rgorsuch/TwoFiles/' <--> 'lakefs://sandbox/upload-from-hq4/TwoFiles'...
upload larger.dat ... 74.7% [#############.....] [6.84GB in 28.101954s]
try-version: line 11: 17389 Killed lakefs-$VERSION/lakectl fs upload -r -s ~/TwoFiles lakefs://sandbox/upload-from-hq4/TwoFiles
real 1m24.146s
user 0m23.092s
sys 0m12.699s
May 14 22:46:56 hpc-desktop-staging-06 kernel: Out of memory: Killed process 17389 (lakectl) total-vm:24777284kB, anon-rss:14115504kB, file-rss:0kB, shmem-rss:0kB, UID:1049 pgtables:27844kB oom_score_adj:0
$ lakefs-1.43.0/lakectl --version
lakectl version: 1.43.0
lakeFS version: 1.43.0
lakectl out of date! (Available: 1.56.1)
lakeFS out of date! (Available: 1.56.1)
Get the latest release https://github.com/treeverse/lakeFS/releases
$ time lakefs-1.43.0/lakectl fs upload -r -s /home/rgorsuch/TwoFiles lakefs://sandbox/upload-from-hq4/TwoFiles
diff 'local:///home/rgorsuch/TwoFiles/' <--> 'lakefs://sandbox/upload-from-hq4/TwoFiles'...
upload larger.dat ... fail! [462.55MB in 1.085s]
upload larger.dat failed: upload request failed https://storagestaging-datalakee54831b2-tvr7kjurz8by.s3.us-gov-west-1.amazonaws.com/sandbox/data/g8kuadg0m353963djk20/d0ihteo0m353963djk2g%2Cp1hRsUnAJpzK_orvVl5_mBqG1PAjL_dvXPCzS4_qfdE?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAYGFEGYUHJCAXBDOH%2F20250514%2Fus-gov-west-1%2Fs3%2Faws4_request&X-Amz-Date=20250514T225243Z&X-Amz-Expires=900&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEAaDXVzLWdvdi13ZXN0LTEiSDBGAiEA5980a9LwxrmUyAdw5WnRFtKiHjlZuMxav3%2BiqrVG%2FI0CIQDmR4mkdBccLbegamFkmWTZAiwsfGk%2BlHFOF8s78y3aUSrEBAgdEAAaDDU2Mjk4NTA5MjM2NiIM0pI56vTK8JLwJuPaKqEE6twWE9V5yU6RjbQ%2FkAqBOhioofHv2RojsIn64xAlrBuISLeR0QFci0WvuDg%2Fyl%2BECKIzptABycxX9Bi7SCy8mxFP0stWaJQ4iLb15am4DBm9lAcJaIEOI7WI8BJTOtR98M2v7kHD0DapIJJgJZo5CSRCp%2FTBpWfrZ7%2F0OgTBocNRykz4IePXF%2FGEeivcJJNGZ63qkmtLQfTRXw%2FR9zP5z94uITrNKZvSYT6fKYmsUcKYqUxxtFx3Ck%2FA%2Bn3gE6eT87lErQKzvrWmY0oaCyy9VMF6gDGYSxeKsECu3ZPGkti%2BoMExFR3nJ%2B3M7UiVUsrPhCRsi%2BcWde8D%2FrkRYR2%2BaDRwSU1GzAklktp4Am4fRZep2Fko%2F687KxDuelY4%2BeFpQnuJhjFX9ZAasKxdVYFd%2Ba0YNOnKMcAlj6J2ufwnbhePzDj3UqzVdsL4RuVNO89wnO6VJP8NniK%2B1%2B2B7mgqaW8B7L71TPRVcjb4i5vnUGw2EG%2BEQ3d%2B05aOVXBiand4H%2Frq99xEGu3fvhyocpD%2FlBm2hAYvq1I%2Bx3uDv1HUlKfK7nrtfRt5UgNo6jxpvo0YrggxSfyW%2BguGfGGRh0VwIb%2FlZGfDShZMD45F4WHwCqgia2AR5QCMUwwQzSIMbwTLXUXplHNay5nS9MXdkD1EtTIkHsxkmwdNme%2FOu9pUUaaMvZURYdHewjfwHIM6nkvh%2BqAWKswbllPBjpOne8v%2BsyMww7uUwQY6pgEVK4MAHZHjt15HgDWgfJ6O4Mt2YchM7J%2BG8MXfqpBnFZd6xmBWL3FI2rB09AMKDyYoA7C%2FzDbihIKX96AvH%2FT02UsF8814Fgw5HkGfG%2FmR28lU6z20apDo8YEpX4o0HBmrxwy2PbiDXekn2Qdl4Leenr9exN2lO5q3hoUdQLIYZlNmL2iatIBe1jiQlyNj6YacsqBZy0QUI8cr9IXd4hd5BiRVlGHQ&X-Amz-SignedHeaders=host&x-id=PutObject&X-Amz-Signature=f79615a4dbdb671811178e3ba6ecf2bd0bcf1c772e03f81317df57e0daee19ae: 400 Bad Request
Error executing command.
real 0m1.165s
user 0m0.238s
sys 0m0.222s
$ ls -al ~/TwoFiles/
total 9057672
drwxrwxr-x 2 rgorsuch users 6144 May 13 22:30 .
drwx------ 17 rgorsuch users 6144 May 14 21:59 ..
-rw-r--r-- 1 rgorsuch users 9150423448 May 13 22:04 larger.dat
-rw-r--r-- 1 rgorsuch users 124619723 May 13 22:06 smaller.dat
A fix for this was merged, and will be released as part of the next version (probably next week).
A fix for this was released as part of lakeFS v1.63.0.