Aleksei Petrenko

Results 128 comments of Aleksei Petrenko

Hey @anirjoshi ! RNN policies are first-class citizens in Sample Factory. In fact, with the default configuration you will train an RNN (GRU) policy. See these parameter descriptions in cfg.py...

Hi @anirjoshi literally any example would work since, again, this is a default configuration. you can start by reading these tutorials: https://www.samplefactory.dev/03-customization/custom-environments/ https://samplefactory.dev/03-customization/custom-models/

Sorry for radio silence! Yes, the code is here: https://github.com/alex-petrenko/faster-fifo/blob/master/cpp_faster_fifo/tests/comparison_tests.py Depending on the OS/Python version the results may vary greatly!

Did you check the link I provided? It's here: https://github.com/alex-petrenko/faster-fifo/blob/18c46864817c09277bab8aef74bc1b981197937b/cpp_faster_fifo/tests/comparison_tests.py#L95

Hi Tristan! Great question! Your intuition is pretty much on point! I suppose the most straightforward way to implement the evaluator would be to add an "AlgoObserver". There's an example...

Alternatively, if `getpass` is easily available on all platforms, maybe we can just add it to the list of requirements in setup.py?

Hi @gauravkuppa ! Integrating a Gym environment into Sample Factory is pretty straightforward. Take a look at this documentation page to get started: https://www.samplefactory.dev/03-customization/custom-environments/#custom-environment-template We also provide numerous example environment...

I am facing the same issue with Qwen/Qwen2.5-32B-Instruct Token `151977`. vllm version 0.7.2 This does not reproduce in earlier vllm 0.6.2 which seems stable for us but much slower.

Since a lot of people are commenting about this, here's a simple explanation for why this happens: Qwen and some other models come with a few hundred extra out-of-vocab tokens...