Nick Hill
Nick Hill
@raywanb could you resolve the conflicts one more time :pray:
Thanks @raywanb! Could you please also run the linter on the final changes to fix those errors? (`./format.sh`)
@raywanb the lora CI test failures also look suspicious, these aren't currently failing on the main branch. cc @Yard1
@raywanb thanks for the updates! It looks like the lora tests are still [failing](https://buildkite.com/vllm/ci/builds/7742#018f99f5-0763-4d42-8e3f-a5a69842ea23) though. There was an issue in the main branch but I don't think that's the cause....
@ScrapCodes apologies for missing your message here originally. Yes I think that would be good, though I'm actually not sure whether KServe owns the `kserve.io` domain. We shouldn't change the...
> My previous comment is about truncation side, as for various reasons/formats we'd either want to trim from the left or right as well and since it already a parameter...
Thanks @diego898 TBH I don't think it would make sense to use this truncate option at all in conjunction with a system prompt. Some other form of truncation would need...
Thanks @OlivierDehaene, I've now rebased it.
@OlivierDehaene @Narsil continuing discussion from #246, I've pushed a new [commit](https://github.com/huggingface/text-generation-inference/pull/210/commits/08aee68f79a8206257b11c3fda50779db8dc2597) here to abstract the batch "weight" calculations to cover the non-flash attention case too. We use this for example...
Thanks @OlivierDehaene, and sorry for the PR being quite large. For context - I have been making many changes/additions on an internal fork for some time now with the intention...