Nick Hill
Nick Hill
@youkaichao another reason the above approach might be better - IIUC the `get_example_metadata_list` approach won't work if the size varies much at runtime (not sure whether that might be the...
@youkaichao I've opened #4844 to show the idea, PTAL!
@aurickq curious how this relates to https://github.com/vllm-project/vllm/pull/3729?
@MLHafizur this might indicate that the MM and/or adapter containers restarted, could you check whether that's the case?
@lizzzcai though it is now the default, cluster scope operation should be considered relatively alpha and still needs a bit more work. In particular w.r.t. how the secrets are handled...
> I see, the controller keeps watching the etcd by [design](https://github.com/kserve/modelmesh-serving/tree/main/docs/architecture#architecture-overview). Yes, this is read-only however, just used to trigger a predictor reconciliation when things change. Otherwise, the etcd data...
Also encountered this when upgrading from 0.7.6 to 0.7.7, with BLOOM 176B.
@RezaYazdaniAminabadi I can confirm that version 0.8.0 fixed the issue for me.
@RezaYazdaniAminabadi apologies I spoke too soon... it's now working for BLOOM 175B with the pre-sharded fp16 weights, but not the original `.bin` checkpoint shards (which do work with 0.7.6). We...
I'd also been thinking about this recently. I think it would be nice to have some kind of `skip_detokenization` or `include_text=False` option in the sampling params.