Large Git repos aren't able to be indexed
Describe the bug
I'm not sure if this is an issue about my device, because it isn't very powerful. When using tabby to index repos, it always shows some amount of logs like these:
2025-01-16T14:10:36.837159Z WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body
2025-01-16T14:10:36.878975Z WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body
2025-01-16T14:10:36.889699Z WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body
This is especially apparent on https://github.com/bevyengine/bevy, since I left it overnight and it still didn't finish indexing. As stated, I'm not sure if it's just my computer not being able to index very fast, but it even overnight it didn't finish.
Information about your version tabby 0.22.0
Information about your GPU Apple M2 (4+4) @ 3.50 GHz
Additional context Any repo will output those warning logs, but they still finish. For the bevy repo, it gets stuck on the example showcase main file
I'm using this to run: tabby serve --model Qwen2.5-Coder-3B --parallelism 16 --device metal
We have set a timeout for each embedding model request to keep the indexing time manageable. As a result, some requests may fail and trigger warnings.
Please note that these warnings can be safely ignored. Tabby's background indexing job will incrementally re-index any failed chunks
some requests may fail and trigger warnings That's what I thought, but even after 8 hours, it's still not finished. Do you reckon that is a thing to do with a M2 base GPU or not?
if you have a really large repo, that's somewhat expected. The only way to speed it up is to use more powerful model serving backend (and more powerful hardware ) to reduce the time of a cold start.
Good thing is, once finished for the first run, the future incremental indexing shall be much faster.
I've been running the indexing on bevy again for 24 hours this time, and it's still not done. I have noticed that while it is running, it's not using a lot of gpu nor cpu.
Indexing my repo, ~1k lines:
Indexing bevy:
The bevy's indexing spike is a bit shorter than that of my repo's but bevy's repo has definitely more lines than mine. It also doesn't finish the indexing after the spike drops.
I also noticed that the warning logs stopped at the same time as the gpu stops, which led me to believe that the indexing stopped partway through, but didn't report as such.
Thanks for offer the help to debug! I conducted a quick test by indexing 'bery' in Tabby's demo instance, and it functioned as anticipated; therefore, I was unable to replicate the issue. Here is the job log for reference: https://demo.tabbyml.com/jobs/detail?id=Gp3WX1
Could you try turn on debug log (e.g RUST_LOG=debug) to see if there's anything suspicious?
In the log, I found something like this:
2025-01-20T20:19:35.631776Z WARN tabby_index::indexer:crates/tabby-index/src/indexer.rs: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/ci/src/ci.rs","language":"rust","git_hash":"043a887e37ec704dbe97981a7bdfb6ad534d6d5b"}': Failed to embed chunk text: error sending request for url (http://127.0.0.1:30888/embedding)
Also in the system tab, I see
I think it has something to do with the embedding, although I'm not sure how to set it up. Maybe that's the problem?
https://github.com/TabbyML/tabby/issues/3715#issuecomment-2597414985.
The error can be safely disregarded if it occurs infrequently, especially when considered in the context of the total number of chunks processed.
Can you share the command you used to start tabby?
I'm using tabby serve --model Qwen2.5-Coder-3B --parallelism 32 --device metal to start.
The error can be safely disregarded if it occurs infrequently, especially when considered in the context of the total number of chunks processed.
I'm not sure if it occurs frequently or not, but there are 42 warnings like that for a repo of ~500 lines.
I did the same with bevy, and up until the point of failure, there are 255 Failed to build chunk for document warnings.
fix merged in https://github.com/TabbyML/tabby/pull/3805
Thank you for reporting this issue. It should be resolved in https://github.com/TabbyML/tabby/releases/tag/v0.25.0. Please verify, and feel free to reopen the issue if it persists.