Kevin Canwen Xu

Results 36 comments of Kevin Canwen Xu

The server's at http://deplo-mlflo-1s4xwzhh8tic4-97cf518635d8c72d.elb.us-east-2.amazonaws.com/

Hi @Harry-zzh, thanks for your interest in our work! Just to confirm, 3% lower than reported means absolutely, right? Then this is lower than all baselines in Table 1, even...

@Harry-zzh Thanks for the info. Is it on test set (i.e., GLUE server) or validation set? If it's on test set, could you please also provide the results on the...

By the way, in NLP experiments, the students in our implementation of KD and our approach are initialized with pretrained BERT (well-read student) rather than fine-tuned teacher. That's probably the...

Please check the release of this repository for the encrypted zip. Use the password you get when you complete the Google Form to uncompress.

Could you try upgrade Transformers? Also print the input? Not quite sure about this error.

We'll release the code soon. It's actually very simple. Just ask ChatGPT to pick the best response and use that to fine-tune Baize.

> Is that really self-distillation? It sounds more like synthetic data generation and you're still distilling ChatGPT into the model. > > > > Don't get me wrong, it's a...

> Right, but it's the intelligence of ChatGPT you're distilling into your model. > > > > If a child learning math gives 4 answers to a math question, and...

This seems to be a problem with int8. In our test, it is indeed slower than fp16. We'll have an investigation into this.