lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

Create table/schema fails for location on root of an empty Branch

Open guy-har opened this issue 3 years ago • 6 comments

Create table or schema on lakeFS fails when:

  • Using Hive metastore
  • The location is the root of the repository
  • The repository used in the location is empty

With error

Query 20211121_154828_00014_utwsi failed: Got exception: java.io.FileNotFoundException PUT 0-byte object  on main/: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 
Not Found; Request ID: null; S3 Extended Request ID: null), S3 Extended Request ID: null:404 Not Found

Steps to reproduce

using an environment of lakeFS, trino and Hive metastore (e.g https://github.com/treeverse/lakeFS/tree/master/deployments/compose)

  1. create a new repository (s3://example)
  2. create a new schema located in root (in Trino: CREATE SCHEMA test WITH (location = 's3://example/main');)

guy-har avatar Nov 21 '21 16:11 guy-har

Possibly missing a slash? Can try

CREATE SCHEMA test WITH (location = 's3://example/main/');

(note trailing slash in location).

If that works, we might need to revise docs and error message. Note that this is how our paths work, maybe.

arielshaqed avatar Nov 21 '21 19:11 arielshaqed

Possibly missing a slash? Can try

CREATE SCHEMA test WITH (location = 's3://example/main/');

(note trailing slash in location).

If that works, we might need to revise docs and error message. Note that this is how our paths work, maybe.

@arielshaqed with the slash that also didn't work when I tried this yesterday (Unfortunately :))

talSofer avatar Nov 22 '21 07:11 talSofer

Possibly missing a slash? Can try

CREATE SCHEMA test WITH (location = 's3://example/main/');

(note trailing slash in location). If that works, we might need to revise docs and error message. Note that this is how our paths work, maybe.

@arielshaqed with the slash that also didn't work when I tried this yesterday (Unfortunately :))

Yup, thanks -- you're right!

The code at issue is that Catalog.CreateEntry runs ValidatePath, which requires a nonempty Path. So creating a file with an empty name will never work.

But note that S3 is half-similar! While I can upload a file to an empty name, I cannot download it:

$ echo empty path | aws s3  cp -  s3://treeverse-ariels-test/ 
$ aws s3  cp   s3://treeverse-ariels-test/ -
download failed: s3://treeverse-ariels-test/ to - Parameter validation failed:
Invalid length for parameter Key, value: 0, valid min length: 1

@guy-har can you give more details of the use-case, specifically the intended equivalent S3 behaviour?

arielshaqed avatar Nov 22 '21 08:11 arielshaqed

Sure, The expected behavior in both cases (with and without a slash) is to get the OK result and the schema should be created. It won't be exactly like S3, S3 returns an error in cases without slash.

currently

Case S3 lakeFS lakeFS (with no data)
root err1 works err2
root with slash works works err2
root with path works works works

err1 - Can not create a Path from an empty string err2 - PUT 0-byte object on main/: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found

We could decide we want to return err1, but then it should happen also in lakeFS when we have data. Anyway, I don't think it should be part of this issue. @arielshaqed WDYT?

guy-har avatar Nov 22 '21 08:11 guy-har

Thanks for the precise analysis of all the cases!

@arielshaqed WDYT?

I agree with you, I think: lakeFS should behave exactly the same for anything starting with repo/branch as S3 does for the same thing starting with bucket.

arielshaqed avatar Nov 22 '21 09:11 arielshaqed

Considering a path to a branch as S3 bucket can be a problem from S3 perspective as the branch name is part of a path and should act as path. So when we pass an external location to the metastore it will try to create the path, as I understand, without knowing how we consider it.

nopcoder avatar Nov 30 '21 21:11 nopcoder

This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.

github-actions[bot] avatar Nov 01 '23 14:11 github-actions[bot]

Closing this issue because it has been stale for 7 days with no activity.

github-actions[bot] avatar Nov 12 '23 01:11 github-actions[bot]