smeyerhot

Results 13 comments of smeyerhot

Ok, fair enough. Is the ami is for the buildkite agent?

What is the status on this? Can we merge this…

@jordiclive would this PR make it possible to load a PEFT model for inference in the chat?

What is trl/trlx? I am very interested in this use case. Why must the 10-b parameter model be used for rlhf?

I am actively working on this task and would be very interested in further development coordination.

```ask people to donate compute power in a decentralized manner, and reward would just be prioritized in queue for chat``` https://github.com/bigscience-workshop/petals

This is a revised PR, same fix but cleared up some issues with the Dockerfile.

Does this feature request also apply to training? I was looking at the description and hoping to get some clarification on the idea of the next_server waiting for the data...

I am seeing the same problem on every `flash_attn` version. I am using Cuda 12.1 on the new g2 vm instance from gcp. https://cloud.google.com/compute/docs/accelerator-optimized-machines#g2-vms. The underlying GPU is the Nvidia...

> Workaround: install the previous version pip install flash_attn==1.0.5 This might work in some scenarios but not all.