Bird.Z

Results 3 issues of Bird.Z

When inference with the highway early-exit given a batch B; when |B| = 1, the code is ok to run; when |B| > 1, the code can corrupt in the...

avoid nan loss in SupCon

No GQA implementation is found, so the model is not capable to scale to 70B for composerLLAMA. Maybe we need design GQA and introduce head_z for wq and head_z_kv for...