RepoAgent icon indicating copy to clipboard operation
RepoAgent copied to clipboard

Questions: tree sitter, git, ollama

Open magaton opened this issue 1 year ago • 12 comments

Hello, interesting project and architecture. I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?

Also, why did you decide to use pre-commit hooks instead of pullling git repository with a scheduler. Llama index github reader could be leveraged in that case.

Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?

Thanks

magaton avatar Mar 04 '24 19:03 magaton

Anyone?

magaton avatar Mar 06 '24 15:03 magaton

I see that the support for other programming languages is left for future. Have you considered using tree-sitter for code parsing?

Hi there! Thank you for your suggestion. I will look into it. It would be great if you can share any details with me.

Umpire2018 avatar Mar 08 '24 03:03 Umpire2018

image

Reference: here

We applied

  1. Abstract Syntax Tree (AST) to extract all Classes and Functions within the file, including their type, name, code snippets, etc which is similar with

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited).

image image

  1. Jedi to find_all_referencer of single function in repo_agent/doc_meta_info.py Line 270 .

Seems like tree-sitter is better than ast because it provides multiple programming language support. image

Correct me if i am wrong. @LOGIC-10

Umpire2018 avatar Mar 08 '24 06:03 Umpire2018

Thanks for the response guys. Again, excellent work. I am on the same boat and the support for the multiple programming languages is a stopper from using your project.

As I can see you do Python AST + Jedi for the function calls. Replacing python AST with tree-sitter could bring you closer to multi-lnaguage support , but Jedi is usable only for python.

AST is only one layer and here with Jedi you want to add function calls into the picture.

But, there is a standard notion for extracting codebase semantics. It is called CPG (code property graph) and a reference implementation called Joern:

Have you maybe considered that?

magaton avatar Mar 08 '24 11:03 magaton

AST is only one layer and here with Jedi you want to add function calls into the picture.

But, there is a standard notion for extracting codebase semantics. It is called CPG (code property graph) and a reference implementation called Joern:

I wonder if CPG have a python implementation? https://github.com/markgacoka/codepropertygraph may not be a good choice.

And the goal is to replace AST + Jedi via one or multiple library in order to acheieve multi-language support.

Umpire2018 avatar Mar 09 '24 03:03 Umpire2018

I am using Joern for CPG -> Neo4j, but that is scala There is also https://pypi.org/project/cpggen/ in python

magaton avatar Mar 11 '24 19:03 magaton

AppThreat/cpggen: This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

It seems that now is not a good time to introduce CPG but we will definitely consider tree sitter.

Umpire2018 avatar Mar 13 '24 05:03 Umpire2018

Understood, but when you use tree-sitter, maybe you can only take its CST output and use a code chunker from llama index https://docs.sweep.dev/blogs/chunking-improvements

magaton avatar Mar 13 '24 09:03 magaton

Do you plan to support Ollama and if so, which of the open source models you reckon would be the best fit?

Seems like Ollama have provided openai-compatibility so i think support Ollama or others open source llm is not high priority.

Right now we only used Chat completions ablility.

Similar projects for reference are as follows:

  1. vllm
  2. llama-cpp-python
  3. Ollama

Umpire2018 avatar Mar 16 '24 01:03 Umpire2018

Hello, I too wanted support for languages other then python. Does anybody know the approach or neccsessary changes to be done to the existing code repository?

Major-wagh avatar Mar 29 '24 09:03 Major-wagh

openai很多地方无法使用,我也期待支持ollama

biandan avatar Jul 14 '24 17:07 biandan

Is there any method/approach for supporting multiple programming language to find_all_referencer of single function ?

sandeshchand avatar Aug 07 '24 09:08 sandeshchand