lakeFS
lakeFS copied to clipboard
Create table/schema fails for location on root of an empty Branch
Create table or schema on lakeFS fails when:
- Using Hive metastore
- The location is the root of the repository
- The repository used in the location is empty
With error
Query 20211121_154828_00014_utwsi failed: Got exception: java.io.FileNotFoundException PUT 0-byte object on main/: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404
Not Found; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null:404 Not Found
Steps to reproduce
using an environment of lakeFS, trino and Hive metastore (e.g https://github.com/treeverse/lakeFS/tree/master/deployments/compose)
- create a new repository (
s3://example
) - create a new schema located in root (in Trino:
CREATE SCHEMA test WITH (location = 's3://example/main');
)
Possibly missing a slash? Can try
CREATE SCHEMA test WITH (location = 's3://example/main/');
(note trailing slash in location).
If that works, we might need to revise docs and error message. Note that this is how our paths work, maybe.
Possibly missing a slash? Can try
CREATE SCHEMA test WITH (location = 's3://example/main/');
(note trailing slash in location).
If that works, we might need to revise docs and error message. Note that this is how our paths work, maybe.
@arielshaqed with the slash that also didn't work when I tried this yesterday (Unfortunately :))
Possibly missing a slash? Can try
CREATE SCHEMA test WITH (location = 's3://example/main/');
(note trailing slash in location). If that works, we might need to revise docs and error message. Note that this is how our paths work, maybe.
@arielshaqed with the slash that also didn't work when I tried this yesterday (Unfortunately :))
Yup, thanks -- you're right!
The code at issue is that Catalog.CreateEntry
runs ValidatePath
, which requires a nonempty Path
. So creating a file with an empty name will never work.
But note that S3 is half-similar! While I can upload a file to an empty name, I cannot download it:
$ echo empty path | aws s3 cp - s3://treeverse-ariels-test/
$ aws s3 cp s3://treeverse-ariels-test/ -
download failed: s3://treeverse-ariels-test/ to - Parameter validation failed:
Invalid length for parameter Key, value: 0, valid min length: 1
@guy-har can you give more details of the use-case, specifically the intended equivalent S3 behaviour?
Sure, The expected behavior in both cases (with and without a slash) is to get the OK result and the schema should be created. It won't be exactly like S3, S3 returns an error in cases without slash.
currently
Case | S3 | lakeFS | lakeFS (with no data) |
---|---|---|---|
root | err1 | works | err2 |
root with slash | works | works | err2 |
root with path | works | works | works |
err1 - Can not create a Path from an empty string
err2 - PUT 0-byte object on main/: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found
We could decide we want to return err1, but then it should happen also in lakeFS when we have data. Anyway, I don't think it should be part of this issue. @arielshaqed WDYT?
Thanks for the precise analysis of all the cases!
@arielshaqed WDYT?
I agree with you, I think: lakeFS should behave exactly the same for anything starting with repo/branch
as S3 does for the same thing starting with bucket
.
Considering a path to a branch as S3 bucket can be a problem from S3 perspective as the branch name is part of a path and should act as path. So when we pass an external location to the metastore it will try to create the path, as I understand, without knowing how we consider it.
This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.
Closing this issue because it has been stale for 7 days with no activity.