evals more or less fixing sampling calls hanging

This should mostly fix the issue of certain samples hanging due to the API call and blocking the eval from finishing. The solution is to have each thread time out and make a new API call if the original call was taking too long. This was surprisingly hard (for me) to implement correctly, and potentially this is just a hard problem to solve in Python since killing off threads is not supported or regarded as good practice. So it's a "partial" fix because the eval still hangs -- it just does so now at the very end, which is not blocking and (AFAICT) it's safe to interrupt it and you still get all your eval results. I added some docs to note this behavior as well as the environment variable that controls the timeout duration. Would be good to get another set of eyes on this.

Can I have GPT-4 access now?

Mar 17 '23 03:03 zhangmarvin

Wouldn't the eval always still hang at the end if your num_threads is >= num tasks in the eval? I am also interested in considering how to make this easier to use with oaievalset, as you would need to interrupt and then restart the whole set if any single eval hangs

Mar 17 '23 17:03 andrew-openai

Is there any good reason to set the # threads > # tasks? If not, I could just add something that throws an error if someone tries to do that.

For oaievalset, the good thing is that there's a cache, so if you interrupt and restart, you pick back up where you left off, right? (I haven't tested this, just making an educated guess). If so, I'd say this isn't blocking and I can add this to the docs. But curious what you think @andrew-openai

Mar 17 '23 18:03 zhangmarvin

I think it guarantees we don't have any deadlock type situation, where each thread is hopefully going to execute one task

Mar 17 '23 18:03 andrew-openai

Actually I'm doing some light testing now of # threads > # tasks and it's not hanging for me. I think I still don't understand why we would want this case, but I don't think it's broken (any more broken than the other case).

Mar 17 '23 18:03 zhangmarvin

evals evals copied to clipboard

more or less fixing sampling calls hanging

evals
evals copied to clipboard