Vadim Kantorov
Vadim Kantorov
> but this functionality can be expanded in future to allow other types of OCI artifacts (which may be file formats that are not the regular "tar"-based image layers). @thaJeztah...
It would also be important to be able to force text-only (non-multimodal) mode in CLI mode to work around explicitly problems like: - https://github.com/gradio-app/gradio/issues/11331
Yeah, I found that in WP ecosystem there're many-many various markdown plugins taking various approaches wrt rendering and editing (either render markdown2html at post adding time or keep the original...
I think given the wide-spreadness of `` and `` micro-format, it would be very nice to extend the default `load_chat` to provide this option (collapsing contents of think contents) directly...
I.e. I propose to extend the OpenAI-talking client in `external.py/load_chat` to optionally also use this technique and do this thinking message extraction as in https://github.com/gradio-app/gradio/blob/477730ef51697a355a09020b235f6cc4a6fbb9dc/demo/chatbot_thoughts/run.py#L30-L39
Does this new inplace weight loader support online bf16->fp8 quantization? This is needed for the GRPO flow, where we need to frequently re-load new weights, and do online conversion of...
> Do you have an example? I assume you still have to load the original weights before performing quantization? I guess it depends on the quantization method. If it's possible...
> Why would the first way load all the weights? I think this refers to that all the parameters need to be loaded first into the vllm.LLM model
Some additional considerations: - might be good to have an option to also include attachments/uploads as blobs in the exported sqlite db (for it to be a complete backup of...
with torch '2.6.0+cu126' (on cuda12.8 machine), same problem...