OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Upgrade openhands-aci to 0.1.7

Open ryanhoangt opened this issue 11 months ago • 11 comments

End-user friendly description of the problem this fixes or functionality that this introduces

  • [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR is to:

  • Upgrade openhands-aci to 0.1.7.

CC: @xingyaoww


Link of any specific issues this addresses

ryanhoangt avatar Jan 07 '25 16:01 ryanhoangt

@mamoodi seems the eval job failed again :( any idea?

xingyaoww avatar Jan 07 '25 17:01 xingyaoww

Aha, I was just asking about eval https://github.com/All-Hands-AI/openhands-aci/pull/45/files#r1905796005

I'm curious if it shows anything.

enyst avatar Jan 07 '25 17:01 enyst

Hey team. Let me get back to this. You can't run on a fork unfortunately (permission access to secrets). Have a few things going on but can run one soon.

mamoodi avatar Jan 07 '25 17:01 mamoodi

Triggered eval 30 instances.

mamoodi avatar Jan 07 '25 17:01 mamoodi

When running via the UI I see the hidden count is shown, but looking into the output of view in the evaluation output of instance astropy__astropy-14995 not sure why it's not updated 🤔

openai__claude-3-5-sonnet-20241022-1736272164.5043015.json

Here's the repo at the base commit, which includes hidden dirs e.g. .github, .circleci, etc: https://github.com/swe-bench/astropy__astropy/tree/b16c7d12ccbc7b2d20364b89fb44285bcbfede54

ryanhoangt avatar Jan 08 '25 16:01 ryanhoangt

@mamoodi in the evaluation job did we run a poetry install? When running eval locally I see the output is updated, while seems like it's not in the zip file from your run.

ryanhoangt avatar Jan 08 '25 16:01 ryanhoangt

@ryanhoangt Just a quick thought: I see here that the hidden message is added as a second element in the list, not a continuation of the first string. Is that necessary? Maybe it should be part of a single string, it seems like otherwise we lose it somewhere along the way where the code assumes there can be only one element (maybe in the agent?)

enyst avatar Jan 08 '25 16:01 enyst

@enyst Can you elaborate it a bit, maybe with an example? I'm not sure I'm understanding your concern 😅 Here's what the output looks like, which makes sense to me actually:

Here's the files and directories up to 2 levels deep in /workspace/astropy__astropy__5.2, excluding hidden items:
/workspace/astropy__astropy__5.2
/workspace/astropy__astropy__5.2/GOVERNANCE.md
/workspace/astropy__astropy__5.2/setup.py
/workspace/astropy__astropy__5.2/tox.ini
/workspace/astropy__astropy__5.2/CODE_OF_CONDUCT.md
/workspace/astropy__astropy__5.2/setup.cfg
/workspace/astropy__astropy__5.2/licenses
/workspace/astropy__astropy__5.2/licenses/PYTHON.rst
/workspace/astropy__astropy__5.2/licenses/PYFITS.rst
/workspace/astropy__astropy__5.2/licenses/NUMPY_LICENSE.rst
/workspace/astropy__astropy__5.2/licenses/CONFIGOBJ_LICENSE.rst
/workspace/astropy__astropy__5.2/licenses/JQUERY_LICENSE.rst
/workspace/astropy__astropy__5.2/licenses/PLY_LICENSE.rst
/workspace/astropy__astropy__5.2/licenses/EXPAT_LICENSE.rst
/workspace/astropy__astropy__5.2/licenses/README.rst
/workspace/astropy__astropy__5.2/licenses/AURA_LICENSE.rst
/workspace/astropy__astropy__5.2/licenses/ERFA.rst
/workspace/astropy__astropy__5.2/licenses/DATATABLES_LICENSE.rst
/workspace/astropy__astropy__5.2/licenses/GATSPY_LICENSE.rst
/workspace/astropy__astropy__5.2/licenses/WCSLIB_LICENSE.rst
/workspace/astropy__astropy__5.2/CHANGES.rst
/workspace/astropy__astropy__5.2/CITATION
/workspace/astropy__astropy__5.2/README.rst
/workspace/astropy__astropy__5.2/conftest.py
/workspace/astropy__astropy__5.2/cextern
/workspace/astropy__astropy__5.2/cextern/wcslib
/workspace/astropy__astropy__5.2/cextern/trim_expat.sh
/workspace/astropy__astropy__5.2/cextern/trim_cfitsio.sh
/workspace/astropy__astropy__5.2/cextern/cfitsio
/workspace/astropy__astropy__5.2/cextern/expat
/workspace/astropy__astropy__5.2/cextern/trim_wcslib.sh
/workspace/astropy__astropy__5.2/cextern/README.rst
/workspace/astropy__astropy__5.2/examples
/workspace/astropy__astropy__5.2/examples/README.rst
/workspace/astropy__astropy__5.2/examples/template
/workspace/astropy__astropy__5.2/examples/coordinates
/workspace/astropy__astropy__5.2/examples/io
/workspace/astropy__astropy__5.2/docs
/workspace/astropy__astropy__5.2/docs/lts_policy.rst
/workspace/astropy__astropy__5.2/docs/_static
/workspace/astropy__astropy__5.2/docs/cosmology
/workspace/astropy__astropy__5.2/docs/glossary.rst
/workspace/astropy__astropy__5.2/docs/convolution
/workspace/astropy__astropy__5.2/docs/importing_astropy.rst
/workspace/astropy__astropy__5.2/docs/_templates
/workspace/astropy__astropy__5.2/docs/units
/workspace/astropy__astropy__5.2/docs/nitpick-exceptions
/workspace/astropy__astropy__5.2/docs/uncertainty
/workspace/astropy__astropy__5.2/docs/robots.txt
/workspace/astropy__astropy__5.2/docs/credits.rst
/workspace/astropy__astropy__5.2/docs/wcs
/workspace/astropy__astropy__5.2/docs/logging.rst
/workspace/astropy__astropy__5.2/docs/time
/workspace/astropy__astropy__5.2/docs/install.rst
/workspace/astropy__astropy__5.2/docs/constants
/workspace/astropy__astropy__5.2/docs/whatsnew
/workspace/astropy__astropy__5.2/docs/rtd_environment.yaml
/workspace/astropy__astropy__5.2/docs/_pkgtemplate.rst
/workspace/astropy__astropy__5.2/docs/index.rst
/workspace/astropy__astropy__5.2/docs/modeling
/workspace/astropy__astropy__5.2/docs/stats
/workspace/astropy__astropy__5.2/docs/visualization
/workspace/astropy__astropy__5.2/docs/conftest.py
/workspace/astropy__astropy__5.2/docs/config
/workspace/astropy__astropy__5.2/docs/warnings.rst
/workspace/astropy__astropy__5.2/docs/table
/workspace/astropy__astropy__5.2/docs/known_issues.rst
/workspace/astropy__astropy__5.2/docs/changes
/workspace/astropy__astropy__5.2/docs/nddata
/workspace/astropy__astropy__5.2/docs/timeseries
/workspace/astropy__astropy__5.2/docs/development
/workspace/astropy__astropy__5.2/docs/samp
/workspace/astropy__astropy__5.2/docs/coordinates
/workspace/astropy__astropy__5.2/docs/changelog.rst
/workspace/astropy__astropy__5.2/docs/Makefile
/workspace/astropy__astropy__5.2/docs/make.bat
/workspace/astropy__astropy__5.2/docs/common_links.txt
/workspace/astropy__astropy__5.2/docs/license.rst
/workspace/astropy__astropy__5.2/docs/utils
/workspace/astropy__astropy__5.2/docs/conf.py
/workspace/astropy__astropy__5.2/docs/io
/workspace/astropy__astropy__5.2/CONTRIBUTING.md
/workspace/astropy__astropy__5.2/astropy
/workspace/astropy__astropy__5.2/astropy/cosmology
/workspace/astropy__astropy__5.2/astropy/__init__.py
/workspace/astropy__astropy__5.2/astropy/convolution
/workspace/astropy__astropy__5.2/astropy/compiler_version.cpython-39-x86_64-linux-gnu.so
/workspace/astropy__astropy__5.2/astropy/extern
/workspace/astropy__astropy__5.2/astropy/units
/workspace/astropy__astropy__5.2/astropy/uncertainty
/workspace/astropy__astropy__5.2/astropy/_version.py
/workspace/astropy__astropy__5.2/astropy/wcs
/workspace/astropy__astropy__5.2/astropy/time
/workspace/astropy__astropy__5.2/astropy/tests
/workspace/astropy__astropy__5.2/astropy/constants
/workspace/astropy__astropy__5.2/astropy/_compiler.c
/workspace/astropy__astropy__5.2/astropy/modeling
/workspace/astropy__astropy__5.2/astropy/stats
/workspace/astropy__astropy__5.2/astropy/version.py
/workspace/astropy__astropy__5.2/astropy/logger.py
/workspace/astropy__astropy__5.2/astropy/visualization
/workspace/astropy__astropy__5.2/astropy/CITATION
/workspace/astropy__astropy__5.2/astropy/conftest.py
/workspace/astropy__astropy__5.2/astropy/config
/workspace/astropy__astropy__5.2/astropy/table
/workspace/astropy__astropy__5.2/astropy/nddata
/workspace/astropy__astropy__5.2/astropy/timeseries
/workspace/astropy__astropy__5.2/astropy/samp
/workspace/astropy__astropy__5.2/astropy/coordinates
/workspace/astropy__astropy__5.2/astropy/_dev
/workspace/astropy__astropy__5.2/astropy/utils
/workspace/astropy__astropy__5.2/astropy/io
/workspace/astropy__astropy__5.2/codecov.yml
/workspace/astropy__astropy__5.2/astropy.egg-info
/workspace/astropy__astropy__5.2/astropy.egg-info/not-zip-safe
/workspace/astropy__astropy__5.2/astropy.egg-info/entry_points.txt
/workspace/astropy__astropy__5.2/astropy.egg-info/top_level.txt
/workspace/astropy__astropy__5.2/astropy.egg-info/requires.txt
/workspace/astropy__astropy__5.2/astropy.egg-info/PKG-INFO
/workspace/astropy__astropy__5.2/astropy.egg-info/SOURCES.txt
/workspace/astropy__astropy__5.2/astropy.egg-info/dependency_links.txt
/workspace/astropy__astropy__5.2/MANIFEST.in
/workspace/astropy__astropy__5.2/pyproject.toml
/workspace/astropy__astropy__5.2/LICENSE.rst


13 hidden files/directories in this directory are excluded. You can use 'ls -la /workspace/astropy__astropy__5.2' to see them.

ryanhoangt avatar Jan 08 '25 16:01 ryanhoangt

No worries, it's not a concern, it was just a guess as to why it may "lose" the second string along the way. Looks like it was a bad guess. It seems you found the actual issue!

enyst avatar Jan 08 '25 16:01 enyst

Ran an eval on the 30 instances above locally, the result looks reasonable (baseline got 13/30). CC @xingyaoww

Screenshot 2025-01-09 at 16 02 19

ryanhoangt avatar Jan 09 '25 09:01 ryanhoangt

@ryanhoangt is this the result AFTER we fixed the ordering issue?

xingyaoww avatar Jan 09 '25 14:01 xingyaoww

No, the ordering fix doesn't go into this release. This only contains your fix

ryanhoangt avatar Jan 09 '25 14:01 ryanhoangt

Can we bring in the ordering fix too? We can directly bump this to 0.1.8

xingyaoww avatar Jan 09 '25 14:01 xingyaoww

@xingyaoww Running eval after adding the sorting fix and this pending PR: https://github.com/All-Hands-AI/openhands-aci/pull/51, now we get 12/30 compared to 13/30:

Screenshot 2025-01-10 at 17 45 07

ryanhoangt avatar Jan 10 '25 10:01 ryanhoangt

Do we want it in 0.20 or after the pending release?

enyst avatar Jan 11 '25 00:01 enyst

I'm ok with merging it now unless y'all have different opinion though :)

xingyaoww avatar Jan 12 '25 05:01 xingyaoww

OK, it's view and bug fix, eval is ok and I think we need it elsewhere. @mamoodi I don't think this comes with surprises, other than the agent working a bit better!

enyst avatar Jan 12 '25 06:01 enyst