Joe Cummings

Results 278 comments of Joe Cummings

Thanks for the report! Based on your config and the setup you have, I don't see immediately why this would hit your specified memory limit of 768G. Let me get...

Hey @fabiogeraci, just updating you on this. I'm waiting on a request for multi-node server (PyTorch has limited quantity). If I don't hear back today, I'll just rent one out...

> [@joecummings](https://github.com/joecummings) any progress? Thanks for following up on this! I managed to set up a SLURM cluster to test and have a PR up for review with a mini...

@fabiogeraci The multi node PR has been merged and should be available in nightlies :)

You're right! I'll put up a PR right away to fix this

Ah I see. In that case, @RomDeffayet can I ask if you are running into a bug somewhere WRT this behavior or just noticed that it was a little strange?...

Alright, so the quick fix would be to rename the parameter `custom_generate_next_token` to `compiled_decode_next_token`. However, this intuition was built off the design of the gpt-fast code from over a year...

Can we just expose the compile-mode and pass that into the compile method?

> > Not just for custom template for Llama Guard, would the extra .strip() causes issues with some predefined templates > > I have been skeptical of this for a...