Davis Wertheimer

Results 9 comments of Davis Wertheimer

Plan is to move the `include_embeds=True` versions of Llama/GPTBigCode/generate() into `fms-extras`. Once that is done I'll update the relevant imports here and then we can push this in

I've pulled all the `include_embeds` stuff out of `fms` into here. We now have `EmbedLLaMA` and `EmbedGPTBigCode` subclasses that override the corresponding forward function, and an altered version of `generate`...

Code is ready for review - mypy errors are import errors, ~~it doesn't have `fms-extras` and~~ it doesn't like the local import of `train_speculator_utils`. Should I move the speculator subfolder...

Thanks @sahilsuneja1 for putting this together! I don't think we need the caller script - it's ultimately just a simple python call with a bunch of arguments, right? I'd just...

Further CI complaints. I'll try and figure out how to get them running automatically for you rather than having to wait for my explicit go-ahead

Looks like a bunch of that is because this is relying on the paged attention branch, which hasn't fully landed in `fms-extras/main` yet

Actually it looks like we don't have any dependency on `fms-extras` listed in `requirements` - I'm now hitting the same issue with #35. If we can add that then this...

### Stateless Implementation Although the LCG provides the desired random permutation, this approach introduces extra state to be tracked (our position in the recursively-generated permutation sequence, and/or our position in...

Added the requested status reports, I figure we'll clean up the checkpointer utility once we have this tested and working to our satisfaction