BFloat16s.jl Support new Float8 and NormedFloat4 formats?

I have been poking around at how to read GGML formats into Julia and went down the rabbit hole of looking at the implementation of QLoRA which is behind a lot of the current fine-tuning work on consumer devices (laptops instead of heavy GPUs), which makes heavy use of 4-bit quantized "floating point" numbers (NormedQFloat4 in Julia-like nomenclature). It also turns out there is recent development on 8-bit floating point numbers on NVIDIA hardware.

I have some toy code implementing that I can upstream if interested.

Jul 10 '23 15:07 jiahao

Also check out Float8s.jl, I haven't checked what the Nvidia standard for Float8 is, but that seems to be a package that should cover it ;)

Jul 12 '23 13:07 milankl

@maleadt How hard is it to get these in? Of course the package name would have to change. Also, I wonder if all these types should really just make their way into Base.

Oct 31 '23 20:10 ViralBShah

Btw, I'm also happy to move Float8s.jl over here and give people access in case there is interest to manage it closer to BFloat16s.jl due to their similar scope

Oct 31 '23 21:10 milankl

Should we pick a different package name? I think it would be good to combine all of these. Would be good to see what @maleadt thinks about that.

Oct 31 '23 21:10 ViralBShah

I see that unifying them in a common package is appealing, but I believe people to still expect using BFloat16s, Float8s to just work, or do you want to change that to using LowPrecisionFloats: BFloat16, Float8 ?

Oct 31 '23 21:10 milankl

Those packages are registered, so those have to be in the registry - but going forward, they can include warnings about being discontinued and pointing users to the new package. Thus LowPrecisionFloats should be a new package if we do go forward with unification.

I've invited you as owner of this org in case you want to move Float8s.jl to also live in here, and to create the new package in this org as well.

Nov 01 '23 01:11 ViralBShah

I think it would be better to start a new package with a focus on sharing functionality between the different floating point types (e.g. an IEEEFloat{width, exponent} type, or something like it) instead of copying code from other packages here. Maybe at some point that could move into Base (as it could be used for the existing floating-point types), but right now I don't think these types are widely used enough to warrant inclusion.

Nov 01 '23 07:11 maleadt