SWE-agent
SWE-agent copied to clipboard
Speed up evaluation by caching task environments as docker images
What does this implement?
This PR introduces the ability to cache the environment created for each SWE-bench task as the docker image. It saves the filesystem and environment variables (using a file) with docker commit
, which produces a new docker image with the tag unique to the given task. The tag contains the dataset name, split, and task number. The feature could be enabled by flag --cache_task_images
.
This change addresses the issue of spending a big chunk of evaluation time on setting up the task environments. Timing test on a dev split of princeton-nlp/SWE-bench_Lite
(23 tasks), on a 2-core VM:
- Avg. time to prepare 1 task environment: 78.3 sec
- Avg. time to load cached environment from the image: 10.1 sec
As the repo states the avg. task run time of 1.5 minutes, this PR improves the speed of the consecutive evaluations by up to 40% (for some HW setups).
Any other comments?
I expected the change to use a small amount of disk space since all task environments share the same base image and Docker uses OverlayFS to avoid storing duplicate image parts. However, each image ends up using ~1.5GB of disk space per task. The dev split of SWE-bench_lite requires ~40GB of disk space, while the test split would consume ~500 GB. Although this issue should be addressed later, it still could be a reasonable trade-off when running a few consecutive evaluations to test some changes.
Very cool stuff! I'll take a closer look at that on Friday!
Codecov Report
Attention: Patch coverage is 37.50000%
with 35 lines
in your changes are missing coverage. Please review.
:exclamation: No coverage uploaded for pull request base (
main@088aabd
). Click here to learn what that means. Report is 1 commits behind head on main.
:exclamation: Current head fef3d32 differs from pull request most recent head 3d28971. Consider uploading reports for the commit 3d28971 to get more accurate results
Files | Patch % | Lines |
---|---|---|
sweagent/environment/swe_env.py | 29.54% | 31 Missing :warning: |
sweagent/environment/utils.py | 66.66% | 4 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #317 +/- ##
=======================================
Coverage ? 75.72%
=======================================
Files ? 18
Lines ? 2892
Branches ? 0
=======================================
Hits ? 2190
Misses ? 702
Partials ? 0
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I think this looks great! The only thing we'd have to fix is the naming issue depending on the nature of data_path
/rejecting the flag if data_path
is something unsitable
I think this looks great! The only thing we'd have to fix is the naming issue depending on the nature of data_path/rejecting the flag if data_path is something unsuitable
Let me push this on top of your branch :)
I think this looks great! The only thing we'd have to fix is the naming issue depending on the nature of data_path/rejecting the flag if data_path is something unsuitable
Let me push this on top of your branch :)
Sure. How can I do that?
Sure. How can I do that? Probably I already can :) (else it should be this setting).
Realistically, I'll probably only get to it this Wednesday though, so no reason to wait for me until then haha
Probably I already can :) (else it should be this setting).
Aha, I see. I've enabled that option, thanks.
Hmm, somehow pushing on this PR doesn't work, not sure why. Let me merge your PR and then apply my changes on top :)
Thanks again for the very nice addition! ❤️
I've highlighted your contribution in our changelog :)