Philipp Moritz

Results 85 comments of Philipp Moritz

I'm +1 to supporting activation scales in the FP16 checkpoint and not in JSON. This way less configurations need to be supported and everything is uniform :)

> I think I've tried several ways but didn't luck at this moment (maybe I also miss something). I suppose the main reason is kv quantization happens during kv-cache write...

Ah I see, in that case, `kAuto` is a good name since it is the same as "auto" in python. I didn't realize it required a special code path :)

Hey jingpengwu, thanks for your message! I'm mostly using Python these days, so I probably won't implement it myself at the moment, but if you are interested in helping, I'm...

I'm glad to hear the project could be useful for your work! First you will need to update the version of Caffe that is included in Strada. It needs to...

Can the files `hip_float8.h` and `hip_float8_impl.h` be part of some AMD SDK going forward? They shouldn't be part of vLLM :)

Thanks for doing this ❤️ Just a small nit: At the moment we have an unholy mix of sometimes `1` being true and sometime `"true"` being true for environment variables...

Richard pointed out to me that there is python setup.py develop which does what we want for development (i.e. you don't need to re-run python setup.py if you edit the...

It is a bummer that github doesn't render the diff between the old and new nvidia quant_utils.cuh -- for ease of reviewing, here is the diff: ```diff (base) pcmoritz@pcmoritz-DQ44HV60WX /tmp...

Did you investigate the performance impact of passing `__nv_fp8_interpretation_t` around at runtime? Have you considered making the format a template parameter of the `vec_conversion` and related functions (e.g. by reusing...