Ryan H. Tran comments

Results 87 comments of


                                            Ryan H. Tran

Support MINT benchmark (MATH, GSM8K subset)

The evaluation for MATH subset now can be run using `bash ./evaluation/mint/run_infer.sh`. The result looks like below: ``` { "id": 0, "instance": { "task_name": "reasoning", "task_id": 0, "prompt": "What is...

Support MINT benchmark (MATH, GSM8K subset)

I tested locally with the 2 first examples and both passed now. Can you try again? @yufansong

Support Logic Reasoning Benchmark

I'm not sure if I'm understanding correctly, but this implementation seems to be a bit different from the original `ProntoQA`. Here we feed the program in advance to the agent...

Support Logic Reasoning Benchmark

> I have not read the original paper. Could you tell me the difference between `original ProntoQA` and this implementation? From my understanding, the original implementation gives the model the...

Support Logic Reasoning Benchmark

> You are definitely right. The raw ProntoQA dataset does not provide any symbolic language expression or corresponding programs. The logic of a neuro-symbolic method is to 1) parse logic...

Add AgentBench.

Not sure if I'm missing anything, but when I ran the evaluation, there were a bunch of `permission denied` logs and the agent seemed to fail since the log file...

Add AgentBench.

> > Not sure if I'm missing anything, but when I ran the evaluation, there were a bunch of `permission denied` logs and the agent seemed to fail since the...

Use Language Server Protocol (LSP) to re-implement all code editing

Hi @LakshyAAAgrawal, thanks for offering your help! I've tried to integrate `multilspy` into the [openhands-aci](https://github.com/All-Hands-AI/openhands-aci/) before (at [this branch](https://github.com/All-Hands-AI/openhands-aci/tree/ht/integrate-lsp)) but didn't finish it. Would be great if you can have...

Enhance code agent's search ability with ACR's context search API

> This may or may not have some licensing restrictions. Since ACR is under GPL-3.0 Oh yeah I didn't notice this, maybe it's not desirable to use the original code...

Enhance code agent's search ability with ACR's context search API

> Instead of ACR, how about implementing 'mentat auto context'. This issue aims to provide the agent with tools more tailored for coding tasks, improving upon the available search tools...