Ali Ladjevardi

Results 3 issues of Ali Ladjevardi

What I did: 1. Define custom layers for Affine Quantized models, including integer weights, float16 scales and biases (zero point correction) 2. Load MLX-Community quantized model and unpack the weights....

With new differentiable multi UOp, Model parallel is supported out of the box, by sharding optimizer parameters. I will look into 3D parallelism, that lets you shard both model (optimizer)...

bounty locked