lakeFS
lakeFS copied to clipboard
lakeFS - Data version control for your data lake | Git for data
Add Hadoop metrics to lakeFSFS, and discover an easy way to view them in Spark and/or on DataBricks. Measure at least these operations: * Total time spent on each lakeFS...
Much of the data written by lakeFS to the underlying storage is immutable, including physical paths of actual data, and Graveler ranges and metaranges. Consider the option to add the...
Instead of hand-crafted parsing routines, use Hadoop [Configuration.getTimeDuration](https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/conf/Configuration.html#getTimeDuration(java.lang.String,%20long,%20java.util.concurrent.TimeUnit)) for our timeouts.
The current merge strategies enables to choose the source or destination branch for all conflicted objects. It's better if we'll have a per-object option since not all object always require...
**LakeFS v.0.65.0** I triggered a post-commit hook which activates an airflow DAG that lasts 2 minutes. The hook "run row" is added to the "Actions" table only after the DAG...
currently, using Athena with lakeFS works by registering symlinks into Glue for Delta tables, this won't work (or worse: will cause deletef parquet files to also be queried). For delta...
DoD: Specify which actions must pass on the source ref before merging it to the destination.
### What happened? When I call the `CompleteMultiPartUpload` API incorrectly, it always responds with a vague error message that doesn't help me, such as: ``` operation error S3: CompleteMultipartUpload, exceeded...
There's no equivalent diff to perform so it's hard for the user to figure out the conflicting files.
It's useful to be able to edit the message, metadata, commit timestamp and basically everything that is passed when creating a commit.