Ryan H. Tran comments

Results 87 comments of


                                            Ryan H. Tran

Upgrade `openhands-aci` to v0.1.2

Yep sounds good, I can do that!

[Experiment] Add symbol navigation commands into the editor

Eval results for the PR on a subset of swe-bench-lite: | Model | PR resolved | Baseline | | ------ | ------------ | -------- | | `claude-3-5-sonnet-20241022` | *35/59* -...

[Experiment] Add symbol navigation commands into the editor

> but the usual differences in what the LLM "decides" to do are much higher than this. Yeah I agree. Although it's not desirable, sometimes just a small change in...

[Experiment] Add symbol navigation commands into the editor

Yeah seems like my PR didn't include that change unfortunately. Also thanks @enyst for the comment, that makes sense. We maybe able to tell more confidently with more instances run,...

[Experiment] Add symbol navigation commands into the editor

Took the chance to run a full eval on `swe-bench-lite` for claude -- fortunately we got a comparable performance with baseline v2.2 (130/300) and v2.1 in the leaderboard (125/300). At...

[Experiment] Add symbol navigation commands into the editor

Took a look at the result, I can't find any significant/interesting things for now, possibly due to the small difference in the result. Some plots: - Comparing v2.1, v2.2 and...

[Experiment] Add symbol navigation commands into the editor

Yes, I'm working on a refactor and will circle back to this PR soon!

[Experiment] Add symbol navigation commands into the editor

Running eval and the result is not improving much. Given we have some other work with higher priority (e.g. model routing), I'll close this PR for now and circle back...

[Bug]: Failing to run OpenRouter AND Ollama

The issue in trajectory (1) is from a bug in the `aci` -- I made a fix for it [here](https://github.com/All-Hands-AI/openhands-aci/pull/15/commits/3a5655d0eede026ed9c8299b71dbe0264fa8ac4f). Not too sure what happened with the other 2 trajectories

[Bug]: Failing to run OpenRouter AND Ollama

Can you check the logs in terminal to see what errors happened that caused the state to change? You can also set `export DEBUG=1` to have more details visible.