SWE-agent icon indicating copy to clipboard operation
SWE-agent copied to clipboard

Speed up evaluation by caching task environments as docker images

Open ollmer opened this issue 9 months ago • 7 comments

What does this implement?

This PR introduces the ability to cache the environment created for each SWE-bench task as the docker image. It saves the filesystem and environment variables (using a file) with docker commit, which produces a new docker image with the tag unique to the given task. The tag contains the dataset name, split, and task number. The feature could be enabled by flag --cache_task_images.

This change addresses the issue of spending a big chunk of evaluation time on setting up the task environments. Timing test on a dev split of princeton-nlp/SWE-bench_Lite (23 tasks), on a 2-core VM:

  • Avg. time to prepare 1 task environment: 78.3 sec
  • Avg. time to load cached environment from the image: 10.1 sec

As the repo states the avg. task run time of 1.5 minutes, this PR improves the speed of the consecutive evaluations by up to 40% (for some HW setups).

Any other comments?

I expected the change to use a small amount of disk space since all task environments share the same base image and Docker uses OverlayFS to avoid storing duplicate image parts. However, each image ends up using ~1.5GB of disk space per task. The dev split of SWE-bench_lite requires ~40GB of disk space, while the test split would consume ~500 GB. Although this issue should be addressed later, it still could be a reasonable trade-off when running a few consecutive evaluations to test some changes.

ollmer avatar May 06 '24 20:05 ollmer

Very cool stuff! I'll take a closer look at that on Friday!

klieret avatar May 08 '24 01:05 klieret

Codecov Report

Attention: Patch coverage is 37.50000% with 35 lines in your changes are missing coverage. Please review.

:exclamation: No coverage uploaded for pull request base (main@088aabd). Click here to learn what that means. Report is 1 commits behind head on main.

:exclamation: Current head fef3d32 differs from pull request most recent head 3d28971. Consider uploading reports for the commit 3d28971 to get more accurate results

Files Patch % Lines
sweagent/environment/swe_env.py 29.54% 31 Missing :warning:
sweagent/environment/utils.py 66.66% 4 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #317   +/-   ##
=======================================
  Coverage        ?   75.72%           
=======================================
  Files           ?       18           
  Lines           ?     2892           
  Branches        ?        0           
=======================================
  Hits            ?     2190           
  Misses          ?      702           
  Partials        ?        0           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 08 '24 01:05 codecov[bot]

I think this looks great! The only thing we'd have to fix is the naming issue depending on the nature of data_path/rejecting the flag if data_path is something unsitable

klieret avatar May 13 '24 20:05 klieret

I think this looks great! The only thing we'd have to fix is the naming issue depending on the nature of data_path/rejecting the flag if data_path is something unsuitable

Let me push this on top of your branch :)

klieret avatar May 13 '24 20:05 klieret

I think this looks great! The only thing we'd have to fix is the naming issue depending on the nature of data_path/rejecting the flag if data_path is something unsuitable

Let me push this on top of your branch :)

Sure. How can I do that?

ollmer avatar May 13 '24 21:05 ollmer

Sure. How can I do that? Probably I already can :) (else it should be this setting).

Realistically, I'll probably only get to it this Wednesday though, so no reason to wait for me until then haha

klieret avatar May 13 '24 21:05 klieret

Probably I already can :) (else it should be this setting).

Aha, I see. I've enabled that option, thanks.

ollmer avatar May 13 '24 21:05 ollmer

Hmm, somehow pushing on this PR doesn't work, not sure why. Let me merge your PR and then apply my changes on top :)

klieret avatar May 27 '24 21:05 klieret

Thanks again for the very nice addition! ❤️

klieret avatar May 27 '24 21:05 klieret

I've highlighted your contribution in our changelog :)

klieret avatar May 28 '24 18:05 klieret