axon icon indicating copy to clipboard operation
axon copied to clipboard

float16 support for synaptic variables on the GPU (and CPU)

Open rcoreilly opened this issue 1 year ago • 2 comments

It would probably be useful to use float16 for the synaptic variables. There is a nice existing Go package: https://github.com/x448/float16

vulkan 1.2 now includes it as a fully supported option: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_shader_float16_int8.html and in HLSL it is float16_t fully supported with shader model 6.2 and moltenvk appears to support it as of 2018: https://github.com/KhronosGroup/MoltenVK/issues/368

some older GPU hardware does not have native 16 bit support, so it will run much slower there, but the advantages on current hardware likely outweighs that.

rcoreilly avatar May 29 '23 19:05 rcoreilly

Actual Lvis model tests show that the A100 does not allow addressing of more than 31 bits of memory even if broken up using a SynMemBlock:

struct SynMemBlock {
	float vals[64];
};

float SynV(in Context ctx, uint syni, SynapseVars svar) {
	uint64 ix = ctx.SynapseVars.Idx(syni, svar);
	return Synapses[ uint(ix / 64)].vals[uint(ix % 64)];
}

For bench_lvis net, with ndata=2, we're under, but ndata=3 puts over to SynCa = 2.5 GB

  BenchLvisNet:	 Neurons: 47204	 NeurMem: 30.6 MB 	 Syns: 32448512 	 SynIdxs: 371.3 MB 	 SynWts: 618.9 MB 	 SynCa: 1.7 GB

So the next strategy is to use pages of memory instead of blocks, for SynCa which is the main point of failure.

And float16 will help relieve pressure considerably!

rcoreilly avatar Jun 19 '23 21:06 rcoreilly

this looks like a great resource: https://therealmjp.github.io/posts/shader-fp16/

rcoreilly avatar Jun 23 '23 05:06 rcoreilly