Peyton
Peyton
Hi, sorry for the late reply. We employ the higher one from norm and non-norm results. Basically, your results are the same with ours. Due to the difference in CUDA,...
Thank you for pointing that out. I will recheck it these days. Maybe using the downstream tasks performance as fitness is a better way.
Thank you for your interests. Below is the scripts for lora ft: ``` # CUDA_VISIBLE_DEVICES=3 python finetune_lm.py \ # --model_name_or_path /path/to/workspace/wanda/saved_model/llama1_7b_2-4 \ # --config_name "/path/to/llama-7b-hf" \ # --dataset_name c4 \...
@Arnav0400 Hi,sorry for the late reply. It roughly costs one GPU day for LLaMA1-7B.
@Arnav0400 Not yet. But it is easy to perform this evaluation. What you should do is to save the checkpoint of LoRA fine-tuned pruned LLMs. Using LoRA to fine-tune the...
@DaizeDong Thanks for your swift reply. I wonder whether this K-means method can make the gate converge faster than randomly initialized weights? Thanks!
@DaizeDong I conducted experiment over it and here are the results: when employing the k-means initialization, we got the following results: And here are the random initialization's results: The only...
@crazywoola how to remove the credential? I have updated the key in these two places.
@crazywoola for the 3rd bug, here is the dsl file: ``` app: description: 对某个话题进行消化 icon: grinning icon_background: '#D5D9EB' mode: workflow name: 图文混排测试 use_icon_as_answer_icon: false kind: app version: 0.1.2 workflow: conversation_variables:...
@crazywoola For the 2nd bug, here is the code for testing locally: ``` import requests import json url = "https://api.imgrender.cn/open/v1/pics" payload = json.dumps( { "width": 640, "height": 850, "backgroundColor": "#fff",...