Woosuk Kwon
Woosuk Kwon
Hi @ImranL, thanks for trying it out. Currently this PR is not ready and may have some bugs. We are working on this (and other MQA models). That being said,...
Closed as #592 implements Falcon in a nicer way.
@AndreSlavescu Awesome! Thanks for your contribution. Is this PR ready for review? Otherwise, please ping me when you are ready. Thanks again!
@AndreSlavescu What's going on with the PR? If you are not able to continue it, no worries, I can take it. Please let us know if you have any question.
@ri938 This PR is not ready yet. I'll take this over and finish the PR soon.
~~The PR is currently blocked because GPT-J's rotary embedding requires a new kernel (IIUC, it's different from GPT-NeoX's rotary embedding). I will address it this weekend.~~ Turns out that this...
@zhuohan123 This PR is ready for review. Please take a look at it.
@silvacarl2 @ri938 We'v just merged this PR. Please [install vLLM from source](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source) and try it out!
@silvacarl2 Could you check again if you installed the latest vLLM [from source](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source)? BTW, GPTNeo is not supported yet.
Closed as this PR is too old.