axon
axon copied to clipboard
float16 support for synaptic variables on the GPU (and CPU)
It would probably be useful to use float16 for the synaptic variables. There is a nice existing Go package: https://github.com/x448/float16
vulkan 1.2 now includes it as a fully supported option: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_shader_float16_int8.html and in HLSL it is float16_t
fully supported with shader model 6.2 and moltenvk appears to support it as of 2018: https://github.com/KhronosGroup/MoltenVK/issues/368
some older GPU hardware does not have native 16 bit support, so it will run much slower there, but the advantages on current hardware likely outweighs that.
Actual Lvis model tests show that the A100 does not allow addressing of more than 31 bits of memory even if broken up using a SynMemBlock
:
struct SynMemBlock {
float vals[64];
};
float SynV(in Context ctx, uint syni, SynapseVars svar) {
uint64 ix = ctx.SynapseVars.Idx(syni, svar);
return Synapses[ uint(ix / 64)].vals[uint(ix % 64)];
}
For bench_lvis net, with ndata=2, we're under, but ndata=3 puts over to SynCa = 2.5 GB
BenchLvisNet: Neurons: 47204 NeurMem: 30.6 MB Syns: 32448512 SynIdxs: 371.3 MB SynWts: 618.9 MB SynCa: 1.7 GB
So the next strategy is to use pages of memory instead of blocks, for SynCa which is the main point of failure.
And float16 will help relieve pressure considerably!
this looks like a great resource: https://therealmjp.github.io/posts/shader-fp16/