Edward Beeching

Results 55 comments of Edward Beeching

Hey @Ivan-267 are any items left to do on this?

Thanks for raising this issue. I am not an artist, what do you imagine the banner looking like? Perhaps I can use stable diffusion to generate one.

Hi @GeorgeS2019 , thank you for your enthusiasm for this project. Please could you refrain from making some many comments, as every comment sends a message to my inbox, I...

Amazing example @Ivan-267 , I will have time to look at the code later this week.

Thanks for your offer but we do not accept unsolicited requests for work. Closing

Cool. It seems their MARL support is quite limited at the moment. Perhaps we can focus on other things until they have better support?

Hi @clefourrier , the first sample of 4 is computed using sampling with a temperature, rather than greedy?

Thanks, I think also with the vllm backend sampling with t=1.0 is used, even when `num_samples==1`, in that case it should default to greedy to match the transformers backend (I...

Great to see your roadmap, as a contributor to Sample-Factory, clean-rl and StableBaselines3, and the creator of the open source Godot RL Agents library. I can attest to the value...

> what's a bit unclear to me is how zephyr learns to output EOS tokens, despite all the labels of EOS token are marked with -100 and are being masked...