carlos

Results 17 comments of carlos

Another possible solution would be to have a `install_script` attribute in the config that takes a filename or raw string that is executed instead of `install_env` in the environment setup....

Hi! We've just uploaded them. You can download them [here](https://drive.google.com/drive/folders/1EnrKzGAnsb_NmZKyECGmA2DrAc8ZuJ80?usp=sharing). Let us know if you need anything else!

Ah, 472 was only generated due to length constraints. We sample 25% uniformly from 2294, but some of that is longer than gpt-4-32k-0613's context window. (See the gpt4-32k-0613__SWE-Bench_bm25_27K for all...

We're still reviewing the process for evaluating submissions. For now, we'd prefer results with a public or soon-to-be-public paper or technical report and the generated patches we can use to...

Hi @Hk669 Just to be clear, I noticed that you used a "test-repo" in your examples. I'm not sure if that's just a placeholder, but generally the evaluation process will...

Hi @rawwerks, we're aware of this issue. We'll be updating this repository and future evaluations shortly with a solution that I think will be satisfying for everyone. In the mean...

as mentioned by @Domiii, evaluation on SWE-bench Verified should resolve these concerns - where potential human upper bound should be near 100%. Closing this issue for now.

We don't currently run _all_ test cases for every repo. For many repositories, this is a necessary convenience since running all tests would be exorbitant and overkill. This assumption trades...

We haven't been experiencing these issues with users' submissions recently. I'm going to close this issue for now, but please open a new issue if you continue experiencing problems.

Okay, there's an issue with generation for the `train` split at the moment. Are you trying to generate instances for `train` or the `test` split? I'm not sure when we'll...