fastGPT Use Neural-Fortran

I have a suggestion: use https://github.com/modern-fortran/neural-fortran. I have implemented MultiHead Attention, LayerNorm and Embedding there. I think, the most sensible thing would be to combine the effort and use that implementation here. It also includes training code, thus making training code implementation here trivial. Batch inference also becomes a lot easier. Also, I believe we should also combine effort with my WIP Llama with regard to two-byte floats and GPU shaders

Mar 26 '25 17:03 OneAdder

We should definitely collaborate! Yes, Llama we want for sure. There is also this code for Llama in Fortran: https://github.com/rbitr/llm.f90.

One thing that I don't like in neural-fortran and in your llm.f is the heavy use of objects, derived types, interfaces, etc. Given how incredibly simple the LLM models are, I much more like just a bunch of functions like in here: https://github.com/certik/fastGPT/blob/7d96ec2e23b2a1a07aea625f72661d3f650c7ee5/gpt2.f90, a lot shorter and simpler.

Mar 26 '25 18:03 certik

I think the purpose of fastGPT is to test the performance limits of Fortran in GPT2 inference.

NF is a much more general framework and using it here would inevitably make fastGPT slower, which I would think defeat its purpose.

I'm much more interested in adding an examples/gpt2.f90 to neural-fortran using the building blocks you already implemented, and then we can use fastGPT as a reference for the computational performance of GPT2 inference in Fortran. Because @certik has already squeezed so much optimization in fastGPT, we can consider it the performance ceiling on CPUs, and we can use it as reference to improve the performance of NF.

Mar 26 '25 22:03 milancurcic

@certik In this case I would suggest the following:

Add parallelism with coarrays
Remove manual sp so that the project could be compiled to a desired precision
Compile with NVFortran to get two byte floats

I think I will be able to help when I have finished implementing the same things for my Llama. BTW, perhaps, I worked in the enterprise for too long, but I find use of objects and interfaces simpler than plain procedures :D

Mar 27 '25 16:03 OneAdder

@milancurcic @certik @OneAdder I'm glad to see this discussion of combining forces. I'm interested in joining the party via Fiats. :) I have an intern working one day each with me on implementing transformers in Fiats. For purposes of doing inference, we've started by adding an fpm.toml file to fastGPT and creating a Fiats branch that makes fastGPT a dependency. Once we can make calls to fastGPT for inference, we'd like to move to training with a longer-term goal of applying transformers to scientific problems such as solving partial differential equations using techniques like those described here. It would be great to for each project to leverage the efforts of the others if everyone is interested.

FYI, the primary goal of Fiats is exploring novel uses of Fortran language features to support deep learning algorithms. Toward that end, our implementations of inference and training algorithms leverage do concurrent for parallelism and I'll be presenting a paper at ISC next month showing encouraging results of automatic parallelization of inference using LLVM flang. We're also collaborating with compiler developers on automatically offloading inference and training to GPUs via do concurrent and hope to have that working later this year.

May 08 '25 00:05 rouson