tabby icon indicating copy to clipboard operation
tabby copied to clipboard

Large Git repos aren't able to be indexed

Open realhackcraft opened this issue 11 months ago • 11 comments

Describe the bug

I'm not sure if this is an issue about my device, because it isn't very powerful. When using tabby to index repos, it always shows some amount of logs like these:

2025-01-16T14:10:36.837159Z  WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body
2025-01-16T14:10:36.878975Z  WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body
2025-01-16T14:10:36.889699Z  WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body

This is especially apparent on https://github.com/bevyengine/bevy, since I left it overnight and it still didn't finish indexing. As stated, I'm not sure if it's just my computer not being able to index very fast, but it even overnight it didn't finish.

Information about your version tabby 0.22.0

Information about your GPU Apple M2 (4+4) @ 3.50 GHz

Additional context Any repo will output those warning logs, but they still finish. For the bevy repo, it gets stuck on the example showcase main file

realhackcraft avatar Jan 16 '25 14:01 realhackcraft

I'm using this to run: tabby serve --model Qwen2.5-Coder-3B --parallelism 16 --device metal

realhackcraft avatar Jan 16 '25 14:01 realhackcraft

We have set a timeout for each embedding model request to keep the indexing time manageable. As a result, some requests may fail and trigger warnings.

Please note that these warnings can be safely ignored. Tabby's background indexing job will incrementally re-index any failed chunks

wsxiaoys avatar Jan 17 '25 04:01 wsxiaoys

some requests may fail and trigger warnings That's what I thought, but even after 8 hours, it's still not finished. Do you reckon that is a thing to do with a M2 base GPU or not?

realhackcraft avatar Jan 17 '25 21:01 realhackcraft

if you have a really large repo, that's somewhat expected. The only way to speed it up is to use more powerful model serving backend (and more powerful hardware ) to reduce the time of a cold start.

Good thing is, once finished for the first run, the future incremental indexing shall be much faster.

wsxiaoys avatar Jan 17 '25 22:01 wsxiaoys

I've been running the indexing on bevy again for 24 hours this time, and it's still not done. I have noticed that while it is running, it's not using a lot of gpu nor cpu.

Indexing my repo, ~1k lines:

Image

Indexing bevy:

Image

The bevy's indexing spike is a bit shorter than that of my repo's but bevy's repo has definitely more lines than mine. It also doesn't finish the indexing after the spike drops.

I also noticed that the warning logs stopped at the same time as the gpu stops, which led me to believe that the indexing stopped partway through, but didn't report as such.

realhackcraft avatar Jan 19 '25 15:01 realhackcraft

Thanks for offer the help to debug! I conducted a quick test by indexing 'bery' in Tabby's demo instance, and it functioned as anticipated; therefore, I was unable to replicate the issue. Here is the job log for reference: https://demo.tabbyml.com/jobs/detail?id=Gp3WX1

Could you try turn on debug log (e.g RUST_LOG=debug) to see if there's anything suspicious?

wsxiaoys avatar Jan 20 '25 01:01 wsxiaoys

In the log, I found something like this:

2025-01-20T20:19:35.631776Z WARN tabby_index::indexer:crates/tabby-index/src/indexer.rs: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/ci/src/ci.rs","language":"rust","git_hash":"043a887e37ec704dbe97981a7bdfb6ad534d6d5b"}': Failed to embed chunk text: error sending request for url (http://127.0.0.1:30888/embedding)

Also in the system tab, I see

Image

I think it has something to do with the embedding, although I'm not sure how to set it up. Maybe that's the problem?

realhackcraft avatar Jan 20 '25 20:01 realhackcraft

https://github.com/TabbyML/tabby/issues/3715#issuecomment-2597414985.

The error can be safely disregarded if it occurs infrequently, especially when considered in the context of the total number of chunks processed.

Can you share the command you used to start tabby?

wsxiaoys avatar Jan 20 '25 20:01 wsxiaoys

I'm using tabby serve --model Qwen2.5-Coder-3B --parallelism 32 --device metal to start.

The error can be safely disregarded if it occurs infrequently, especially when considered in the context of the total number of chunks processed.

I'm not sure if it occurs frequently or not, but there are 42 warnings like that for a repo of ~500 lines.

realhackcraft avatar Jan 20 '25 21:01 realhackcraft

I did the same with bevy, and up until the point of failure, there are 255 Failed to build chunk for document warnings.

realhackcraft avatar Jan 20 '25 21:01 realhackcraft

fix merged in https://github.com/TabbyML/tabby/pull/3805

wsxiaoys avatar Feb 10 '25 10:02 wsxiaoys

Thank you for reporting this issue. It should be resolved in https://github.com/TabbyML/tabby/releases/tag/v0.25.0. Please verify, and feel free to reopen the issue if it persists.

zwpaper avatar Feb 20 '25 15:02 zwpaper