Ali Ladjevardi
Results
3
issues of
Ali Ladjevardi
What I did: 1. Define custom layers for Affine Quantized models, including integer weights, float16 scales and biases (zero point correction) 2. Load MLX-Community quantized model and unpack the weights....
With new differentiable multi UOp, Model parallel is supported out of the box, by sharding optimizer parameters. I will look into 3D parallelism, that lets you shard both model (optimizer)...
bounty locked