lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

lakectl error when uploading files larger than 5 GB recursively

Open shakti-gupta-snr opened this issue 7 months ago • 2 comments

I was trying to recursively upload large volume of data to my repo (s3 blockstore) using lakectl.

I noticed that if files in the directory are lager than 5 GB, then the following command gives an error. It start the uploads but after a couple of seconds it fails. The files are around 15 GB each. This issue is happening everytime when the individual files are larger than 5 GB.

Command:

 lakectl fs upload --source . lakefs://repo/branch/directory --recursive

Output:

upload file_1.txt                        ... fail! [123.37MB in 1.068s]
upload file_3.txt                        ... fail! [154.34MB in 1.068s]
upload file_2.txt                        ... fail! [173.11MB in 1.069s]

It also returns an URL with Error 400 Bad Request which is something like this

upload file_3.txt failed: upload request failed 
https://<bucket>.s3.us-east-1.amazonaws.com/repo/data/<some token>/
<some token>
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=<xxxx>
&X-Amz-Date=20250505T083730Z
&X-Amz-Expires=900
&X-Amz-Security-Token=<xxxxx>
&X-Amz-SignedHeaders=host
&x-id=PutObject
&X-Amz-Signature=<xxxxxx>: 400 Bad Request

But, when I upload files individually using the following command, the upload suceeds.

for i in 1 2 3; do
  lakectl fs upload --source "file_${i}.txt" lakefs://repo/branch/directory
done

Is there anything I am doing wrong or is this is an issue with lakectl?

shakti-gupta-snr avatar May 05 '25 09:05 shakti-gupta-snr

I observed the same thing on Amazon Linux 2, using the latest version of lakefs when there are at least 2 files in the recursive upload, if even one of them is large (in my case 8 GB).

$ lakectl --version
lakectl version: 1.56.0
lakeFS version: 1.56.0

rgorsuch avatar May 14 '25 19:05 rgorsuch

I ran the recursive upload on a handful of versions to find when the issue was introduced. Here are my results, it appears the issue was introduced in v1.43.0. Its also worth noting that the upload also proceeded on v1.43.0 with --pre-sign=false.

1.56.1 ❌ 1.56.0 ❌ 1.55.0 ❌ 1.51.0 ❌  1.49.1 ❌ 1.45.0 ❌ 1.43.0 ❌ 1.42.0 ✅ Upload proceeded 1.41.0 ✅ Upload proceeded 1.40.0 ✅ Upload proceeded

I'd like to say the upload worked on 1.42.0, but I'll just say it proceeded beyond where the others failed, and uploaded about half the file before taking a hit from the oom-killer. The larger of the two files being uploaded was 8 GB.

+ lakefs-1.42.0/lakectl --version
lakectl version: 1.42.0
lakeFS version: 1.42.0

lakectl out of date! (Available: 1.56.1)
lakeFS out of date! (Available: 1.56.1)
Get the latest release https://github.com/treeverse/lakeFS/releases

+ lakefs-1.42.0/lakectl fs upload -r -s /home/rgorsuch/TwoFiles lakefs://sandbox/upload-from-hq4/TwoFiles

diff 'local:///home/rgorsuch/TwoFiles/' <--> 'lakefs://sandbox/upload-from-hq4/TwoFiles'...
upload larger.dat                        ... 74.7% [#############.....] [6.84GB in 28.101954s]
try-version: line 11: 17389 Killed                  lakefs-$VERSION/lakectl fs upload -r -s ~/TwoFiles lakefs://sandbox/upload-from-hq4/TwoFiles

real	1m24.146s
user	0m23.092s
sys	0m12.699s

May 14 22:46:56 hpc-desktop-staging-06 kernel: Out of memory: Killed process 17389 (lakectl) total-vm:24777284kB, anon-rss:14115504kB, file-rss:0kB, shmem-rss:0kB, UID:1049 pgtables:27844kB oom_score_adj:0
$ lakefs-1.43.0/lakectl --version
lakectl version: 1.43.0
lakeFS version: 1.43.0

lakectl out of date! (Available: 1.56.1)
lakeFS out of date! (Available: 1.56.1)
Get the latest release https://github.com/treeverse/lakeFS/releases

$ time lakefs-1.43.0/lakectl fs upload -r -s /home/rgorsuch/TwoFiles lakefs://sandbox/upload-from-hq4/TwoFiles

diff 'local:///home/rgorsuch/TwoFiles/' <--> 'lakefs://sandbox/upload-from-hq4/TwoFiles'...
upload larger.dat                        ... fail! [462.55MB in 1.085s]
upload larger.dat failed: upload request failed https://storagestaging-datalakee54831b2-tvr7kjurz8by.s3.us-gov-west-1.amazonaws.com/sandbox/data/g8kuadg0m353963djk20/d0ihteo0m353963djk2g%2Cp1hRsUnAJpzK_orvVl5_mBqG1PAjL_dvXPCzS4_qfdE?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAYGFEGYUHJCAXBDOH%2F20250514%2Fus-gov-west-1%2Fs3%2Faws4_request&X-Amz-Date=20250514T225243Z&X-Amz-Expires=900&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEAaDXVzLWdvdi13ZXN0LTEiSDBGAiEA5980a9LwxrmUyAdw5WnRFtKiHjlZuMxav3%2BiqrVG%2FI0CIQDmR4mkdBccLbegamFkmWTZAiwsfGk%2BlHFOF8s78y3aUSrEBAgdEAAaDDU2Mjk4NTA5MjM2NiIM0pI56vTK8JLwJuPaKqEE6twWE9V5yU6RjbQ%2FkAqBOhioofHv2RojsIn64xAlrBuISLeR0QFci0WvuDg%2Fyl%2BECKIzptABycxX9Bi7SCy8mxFP0stWaJQ4iLb15am4DBm9lAcJaIEOI7WI8BJTOtR98M2v7kHD0DapIJJgJZo5CSRCp%2FTBpWfrZ7%2F0OgTBocNRykz4IePXF%2FGEeivcJJNGZ63qkmtLQfTRXw%2FR9zP5z94uITrNKZvSYT6fKYmsUcKYqUxxtFx3Ck%2FA%2Bn3gE6eT87lErQKzvrWmY0oaCyy9VMF6gDGYSxeKsECu3ZPGkti%2BoMExFR3nJ%2B3M7UiVUsrPhCRsi%2BcWde8D%2FrkRYR2%2BaDRwSU1GzAklktp4Am4fRZep2Fko%2F687KxDuelY4%2BeFpQnuJhjFX9ZAasKxdVYFd%2Ba0YNOnKMcAlj6J2ufwnbhePzDj3UqzVdsL4RuVNO89wnO6VJP8NniK%2B1%2B2B7mgqaW8B7L71TPRVcjb4i5vnUGw2EG%2BEQ3d%2B05aOVXBiand4H%2Frq99xEGu3fvhyocpD%2FlBm2hAYvq1I%2Bx3uDv1HUlKfK7nrtfRt5UgNo6jxpvo0YrggxSfyW%2BguGfGGRh0VwIb%2FlZGfDShZMD45F4WHwCqgia2AR5QCMUwwQzSIMbwTLXUXplHNay5nS9MXdkD1EtTIkHsxkmwdNme%2FOu9pUUaaMvZURYdHewjfwHIM6nkvh%2BqAWKswbllPBjpOne8v%2BsyMww7uUwQY6pgEVK4MAHZHjt15HgDWgfJ6O4Mt2YchM7J%2BG8MXfqpBnFZd6xmBWL3FI2rB09AMKDyYoA7C%2FzDbihIKX96AvH%2FT02UsF8814Fgw5HkGfG%2FmR28lU6z20apDo8YEpX4o0HBmrxwy2PbiDXekn2Qdl4Leenr9exN2lO5q3hoUdQLIYZlNmL2iatIBe1jiQlyNj6YacsqBZy0QUI8cr9IXd4hd5BiRVlGHQ&X-Amz-SignedHeaders=host&x-id=PutObject&X-Amz-Signature=f79615a4dbdb671811178e3ba6ecf2bd0bcf1c772e03f81317df57e0daee19ae: 400 Bad Request
Error executing command.

real	0m1.165s
user	0m0.238s
sys	0m0.222s
$ ls -al ~/TwoFiles/
total 9057672
drwxrwxr-x  2 rgorsuch users       6144 May 13 22:30 .
drwx------ 17 rgorsuch users       6144 May 14 21:59 ..
-rw-r--r--  1 rgorsuch users 9150423448 May 13 22:04 larger.dat
-rw-r--r--  1 rgorsuch users  124619723 May 13 22:06 smaller.dat

rgorsuch avatar May 14 '25 23:05 rgorsuch

A fix for this was merged, and will be released as part of the next version (probably next week).

itaigilo avatar Jul 11 '25 19:07 itaigilo

A fix for this was released as part of lakeFS v1.63.0.

itaigilo avatar Jul 14 '25 07:07 itaigilo