lakeFS
lakeFS copied to clipboard
lakeFS - Data version control for your data lake | Git for data
Running test uses local dynamodb as container. 1. The container binds to a specific port. This limits the running tets in parallel or collide with running services on the host....
We would like to asses how many installations out there have stats disabled. This will help us get an idea.
Resolves: #4157. This PR demonstrates how we can use the lakeFS metadata client to create a Parquet table of a repository's ranges. Hopefully, running GC over Parquet files will gain...
Our architecture doc https://docs.lakefs.io/understand/architecture.html still mentions PostgreSQL as our DB Need to change this to fit the recent changes introduced with KV
We should have some description in the [lakeFS’s python package](https://pypi.org/project/lakefs-client/) referencing the [lakeFS docs](https://docs.lakefs.io/integrations/python.html) and the [lakeFS pydocs](https://pydocs.lakefs.io/).
Garbage collection in lakeFS is essentially an anti-join between the lists of "expired" and "active" addresses. Check whether performing this operation on Parquet files improves performance, and map the risk...
We should warn a user on the lack of persistency, when KV is configured to "mem" Something similar to the warning we display when the blockstore is set to "mem"...
Fixes #4064.
Test with large dataset to find problems. DoD: - GC finishes successfully on large-scale data set. - There are issues for any problems found.