lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

Enhancement request - move or copy datasets inside a repo

Open iddoavn opened this issue 1 year ago • 5 comments

Scenario: One member of the team changes a few tables on their own branch. Then, that member wants to expose one table and that one table only (that may not be committed) to a different team member working on a different branch.

It would be good if we could use a lakectl CP, to copy a data set from one branch to another - This, of course, should be a zero clone copy. Maybe even to a different repo?

Another use case can be a lakectl MV that basically renames a data set. This is also an added capability on top of an object store where if you want to achieve something like this you need to go through a long and potentially expensive exercise of downloading and uploading data.

iddoavn avatar Jun 02 '23 19:06 iddoavn

Here are a few options that might fulfill the requirement:

  1. Commit the table in the source branch and then use cherry-pick of that commit to the destination branch.
  2. Use aws cp s3://repo/branch-source/table s3://repo/branch-dest/table. This isn't a zero-clone copy but it's not downloading and uploading the data. It's performing an object-store side copying, i.e. the copied objects never go thru the client or lakeFS itself. You can use aws mv ... to get the MV functionality, which starts with a similar copy followed by a deletion.

The option to zero-copy uncommitted objects thru lakeFS (i.e. without using a merge, commit, cherry-pick, etc.) was forfeited not long ago. The reasoning was to ensure a safe cleanup of the GC without risking data loss.

itaiad200 avatar Jun 03 '23 13:06 itaiad200

I think that makes a lot of sense for uncommitted data. But for committed data, it would be good to have a copy. Because a commit may include more changes than the ones you want to copy over.

Nevertheless, agree cherry pick is helpful in many cases, especially if you have a good commit hygiene.

iddoavn avatar Jun 03 '23 13:06 iddoavn

@ozkatz please prioritize

idanovo avatar Aug 31 '23 08:08 idanovo

This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.

github-actions[bot] avatar Nov 30 '23 01:11 github-actions[bot]

Closing this issue because it has been stale for 7 days with no activity.

github-actions[bot] avatar Dec 08 '23 01:12 github-actions[bot]