Yuxin Jiang 姜宇心

Results 13 comments of Yuxin Jiang 姜宇心

Hi, I am a little confused about your question. During training, we only use the dev set for evaluation in order to save the best checkpoint. Do you mean that...

**Why is 5-fold cross-validation is used:** After the model is well trained, it can derive sentence embedding vectors, which can be directly used to compute the cosine similarity for STS...

Firstly, You need to do modify in "model.py": change **num_labels** to 3. If the number of classes is more than 2, then AUC can not be computered by the original...

Hi, Sorry for the late reply. I have released the unsupervised BERT-base checkpoint, you can download it from the link https://drive.google.com/drive/folders/1OcgJ-7gU_N7J7x5ezrigFLlTU8h7Uvjx.

Hi, I think you may try to change 'Roberta' into 'DebertaV2' in models.py. e.g., change `from transformers.models.roberta.modeling_roberta import RobertaPreTrainedModel, RobertaModel, RobertaLMHead` to `from transformers.models.deberta_v2.modeling_deberta_v2 import DebertaV2PreTrainedModel, DebertaV2Model, DebertaV2LMHead`, and create...

Thanks for your interest in our work. In each iteration for the student model, we start with the model trained in the last iteration.

Hi, thanks for your interest in our work. The delta weights size is 25G since we use **float32** as torch_dtype, and vicuna-7b-delta-v1.1 uses **float16**. We have changed to float16 and...

Hi, in most cases, the gpt response contains strings like `Instruction: xxx\nInput: xxx`. So we use `re.search(r"Instruction: (.+)\n", raw_instructions)` to extract the generated instruction. However, in some cases, the response...

Hi, thank you for your interest in our work :) Unfortunately, we are unable to release the dataset for distillation at this moment. Our work is still in progress, and...

Thanks for your interest in our work. Let me illustrate it using an example: In the first iteration, the _train_pool_ and _cache_pool_ are both 52,000 Alpaca instructions and we generate...