Awni Hannun
Awni Hannun
> The tests seem fast enough for the normal CI as well don't they? I think you're right. I'll add them here instead.
> I could just cast the fp8 tensors to f32, but am I right in thinking that If I do that followed by a 8bit quant I would lose a...
> Do you think q8 activations could be interesting for performance - maybe even integration into the sdpa functions? With KV cache quantization absolutely! We already have that in mlx-lm...
> I was wondering what you thought about keeping the activations quantized, though. For instance, a quantized rms norm, gelu, and so on. Not sure if these ops using quantized...
> Isn't this useful for the CUDA backend? Yes very much so. We will likely add fp8.. just waiting for the right time. For Apple silicon it's still not that...
Typically it should be possible to treat the logprobs as if they were logits. Unless you are doing something that relies on the normalization term (which is not so common)....
I'm not certain about including this as `WeightNorm` as I thought `WeightNorm` is not used so much anymore.. thoughts? Either way we should not make free functions in C++ and...
> regarding layer of course you're right, if you agree regarding usefulness I will happy refactor it fully into mlx.nn as it should have been from the start That would...
@cavit99 are you planning coming back to this PR?
I'm going to close this as inactive. We're open to revisiting the addition of weight norm in the future.