Sandbox: adjust whitespace processing
Turns out lstrip is different from removeprefix. Current implementation could trigger some weird bugs.
Unfortunately this also breaks the current tests. Multiline commands don't really work anymore :/
@xingyaoww I think I've hit all your concerns here. I didn't find a bash parser that made this easy unfortunately, but agree that would be a better solution
Looks great! I was actually wondering if we can get SWE-Bench merged first this week (for reproducibility of our swebench score), since it also did tweak this part of logic a bit (https://github.com/OpenDevin/OpenDevin/pull/1468/files#diff-2f8dea96fe3d767df8d6f113d9b6587294247146160f57fbc8688a0ac99959ce); and then we can test this PR once swe-bench got merged to see if it produces any sort of weird error - if not, we can get this in :)
Hmm I do get the reproducibility concern. Is your PR close? Seems like it's still in draft mode
@rbren We still need some time to work on https://github.com/OpenDevin/OpenDevin/pull/1468 to improve the documentation & visualizer & figure out where to store all the intermediate experiment results. I'm finally done with the conference and hopefully we can get that merged early next week.