Joe Cummings comments

Results 278 comments of


                                            Joe Cummings

70B Fine-tuning GPUs Utilization

Thanks for the report! Based on your config and the setup you have, I don't see immediately why this would hit your specified memory limit of 768G. Let me get...

70B Fine-tuning GPUs Utilization

Hey @fabiogeraci, just updating you on this. I'm waiting on a request for multi-node server (PyTorch has limited quantity). If I don't hear back today, I'll just rent one out...

70B Fine-tuning GPUs Utilization

> [@joecummings](https://github.com/joecummings) any progress? Thanks for following up on this! I managed to set up a SLURM cluster to test and have a PR up for review with a mini...

70B Fine-tuning GPUs Utilization

@fabiogeraci The multi node PR has been merged and should be available in nightlies :)

The first token generation does not use custom_generate_next_token

You're right! I'll put up a PR right away to fix this

The first token generation does not use custom_generate_next_token

Ah I see. In that case, @RomDeffayet can I ask if you are running into a bug somewhere WRT this behavior or just noticed that it was a little strange?...

The first token generation does not use custom_generate_next_token

Alright, so the quick fix would be to rename the parameter `custom_generate_next_token` to `compiled_decode_next_token`. However, this intuition was built off the design of the gpt-fast code from over a year...

[WIP] max-autotune

Can we just expose the compile-mode and pass that into the compile method?

Assert LLama Vision Image size divides by 14

Add a test.

Finetune meta-llama/Llama-Guard-3-1B

> > Not just for custom template for Llama Guard, would the extra .strip() causes issues with some predefined templates > > I have been skeptical of this for a...