Results 633 comments of Vadim Kantorov

> but this functionality can be expanded in future to allow other types of OCI artifacts (which may be file formats that are not the regular "tar"-based image layers). @thaJeztah...

It would also be important to be able to force text-only (non-multimodal) mode in CLI mode to work around explicitly problems like: - https://github.com/gradio-app/gradio/issues/11331

Yeah, I found that in WP ecosystem there're many-many various markdown plugins taking various approaches wrt rendering and editing (either render markdown2html at post adding time or keep the original...

I think given the wide-spreadness of `` and `` micro-format, it would be very nice to extend the default `load_chat` to provide this option (collapsing contents of think contents) directly...

I.e. I propose to extend the OpenAI-talking client in `external.py/load_chat` to optionally also use this technique and do this thinking message extraction as in https://github.com/gradio-app/gradio/blob/477730ef51697a355a09020b235f6cc4a6fbb9dc/demo/chatbot_thoughts/run.py#L30-L39

Does this new inplace weight loader support online bf16->fp8 quantization? This is needed for the GRPO flow, where we need to frequently re-load new weights, and do online conversion of...

> Do you have an example? I assume you still have to load the original weights before performing quantization? I guess it depends on the quantization method. If it's possible...

> Why would the first way load all the weights? I think this refers to that all the parameters need to be loaded first into the vllm.LLM model

Some additional considerations: - might be good to have an option to also include attachments/uploads as blobs in the exported sqlite db (for it to be a complete backup of...