lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

[Bug]: new version of lakectl is very slow

Open oliverdain opened this issue 10 months ago • 3 comments

What happened?

What actually happened, including error codes if applies.

Steps to Reproduce:

  1. Download the latest version of lakectl
  2. Clone a repo with a decent amount of data (about 100GB in my case)
  3. Make one small modification and run lakectl local commit. This takes about 15 minutes on a fairly fast machine for me. Before upgrading this took about 20 seconds.

Expected behavior

It should be much faster. 15 minutes for a local commit or a local status is an awfully long time.

I suspect it's related to https://github.com/treeverse/lakeFS/pull/7563. Specifically, I was using an older version of lakectl and lakectl local was reporting diffs on a freshly cloned repo. So I upgraded lakectl and now it's correct, but much, much slower.

lakeFS version

1.16.0 for lakectl

How lakeFS is installed

Cloud hosted

Affected clients

lakectl

Relevant log output

No response

Contact details

No response

oliverdain avatar Apr 08 '24 22:04 oliverdain

I have an idea that may help here. Will try to write a PoC that performs an alphabetical scan of a directory tree efficiently in both time and space.

arielshaqed avatar Apr 11 '24 20:04 arielshaqed

I have an idea that may help here. Will try to write a PoC that performs an alphabetical scan of a directory tree efficiently in both time and space.

I'm sure it will take a long time. You can consider storing the state of the current local folder and remote in a specific format, then compare them for differences.

thungrac avatar Apr 12 '24 09:04 thungrac

I have an idea that may help here. Will try to write a PoC that performs an alphabetical scan of a directory tree efficiently in both time and space.

I'm sure it will take a long time. You can consider storing the state of the current local folder and remote in a specific format, then compare them for differences.

Hi @thungrac ,

You are of course correct that listing everything takes a long time. Unfortunately that is literally the feature: download a repo to a local directory, work on it locally without involving lakeFS in any way, and then re-upload the changes. By definition, this requires scanning all files. 🤷

The issue that I wish to fix is in how we scan. A further complication is that lakeFS is an object store, while the local filesystem is a filesystem. There's some more info about the difference here. But the important difference for our purposes is that object stores are just a flat list of objects sorted by pathname, and filesystems arrange files in directories. So when we scan the local subtree we get a files in a different order than when we scan lakeFS. The current implementation re-sorts, which is actually slow and uses lots of memory. I want us to try to remove or reduce the size of that sorting. That will reduce both runtime and memory consumption dramatically.

If we manage to do it, I promise a tech blog :-)

arielshaqed avatar Apr 15 '24 12:04 arielshaqed