Bird.Z
Results
3
issues of
Bird.Z
When inference with the highway early-exit given a batch B; when |B| = 1, the code is ok to run; when |B| > 1, the code can corrupt in the...
avoid nan loss in SupCon
No GQA implementation is found, so the model is not capable to scale to 70B for composerLLAMA. Maybe we need design GQA and introduce head_z for wq and head_z_kv for...