Implementation of `blaze` programs
Thread safety
Currently, programs created with the #[blaze] macro will hold their kernels inside Mutexes (one per kernel).
Whilst this implementation has the benefit of being thread-safe, it also adds avoidable overhead. A thread-unsafe implementation could use RecCells, which would remove the muteness overhead.
A syntax to specify the thread safety of the program should be defined, and a default implementation may be decided.
Program objects
Currently, defined programs with the #[blaze] macro creates a struct for the program, and adds functions to it to enqueue the kernels. An alternative implementation could create a single static or thread local instance of the program struct, and the kernel functions wouldn't be associated with the object.