code2prompt icon indicating copy to clipboard operation
code2prompt copied to clipboard

Any plans to introduce code indexing?

Open RastislavKish opened this issue 1 year ago • 4 comments

Hello,

first of all, a cool project!

Larger codebases often significantly exceed even the largest context windows available these days, while offline LLMs are even more troublesome in this regard.

It could be useful to implement an indexing feature, that would not generate a single prompt from the codebase, but instead output multiple smaller prompts containing max. N tokens, with the purpose of creating some kind of code abstraction. This abstract could be afterwards used together with just a single file of code where modifications should be made.

I don't use LLMs for coding very frequently, but this seems like the only plausible approach for fitting large codebases into LLLMs. Have you made any considerations/experiments with this approach and possible implementation into code2prompt?

RastislavKish avatar Apr 25 '24 00:04 RastislavKish

Hi @RastislavKish, this is an interesting feature request. Dividing prompts into multiple chunks would lose important context when working with the entire codebase and the context-window applies to the entire conversation with an LLM just that it acts as a sliding window where it loses context as we consume more tokens.

Could you please describe how you'd be using such a feature? I'll see if I could think about a feature that could tailor to your needs.

mufeedvh avatar May 30 '24 03:05 mufeedvh

Aider gives some clues on how you could compress the context by just looking at the symbols in the code. https://aider.chat/2023/10/22/repomap.html ... Perhaps you could consider something similar.

swiftugandan avatar Jun 11 '24 10:06 swiftugandan

I once built a really simple python script that traversed the code files in a project and pulled out each function (and maybe symbol?) and created a single markdown file with the parameters, return type, and comments all grouped by file within it.

It was probably a hacky solution to this problem (context Windows were smaller then), but it does help the LLM get broad overall context if you also feed it the full details of sections of the code more relevant to the specific problem you want help with.

dbenn8 avatar Jul 19 '24 01:07 dbenn8