Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

is there a way to use it as a function?

@Kolhax I don't think there's a ready to use SDK. I feel it won't be hard to build a SDK on top of the rest API by your own.

Using lora to finetune domain specific data?

@AngainorDev Thanks for the feedback. I will give it a try and report feedbacks later.

Unable to save the mode weights - GPU OOM

Just a comment it eventually times out. @zhisbug A quick question, how did vicuna workaround this issue in the past and successfully save the weights? ``` 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [01:54

Can Gradio serving expose more parameters?

I can pick this issue up and add common parameters.

finetune with lora error

Seems FSDP is still developing its support for parameter-efficient training. The author of lora support suggest to use deepspeed at this moment. Check https://github.com/lm-sys/FastChat/pull/138#issuecomment-1495289110 for more details

Could I know how do you evaluate your model performance?

@mahlernim Thanks for the reply. The tricky thing is they share some samples but those example don't seems to be able to calculate the accuracy (91.25% vs 87.5%). I think...

How to generate training samples by self-Instruct

self-instruct code is definitely not open sourced yet. You can send me email and we can discuss some details if you are interested in. @cquliujian @FanWan

when to support chatglm2-6b?

If anyone is familiar with chatGLM model architecture, feel free to help on #625. I am new to transformer architecture and not sure if my changes is correct..

Support for chatglm-6b

If anyone is familiar with chatGLM model architecture, feel free to help on #625. I am new to transformer architecture and not sure if my changes is correct..

self-suggestion部分是如何构建数据的

作为prompt的一部分，是否相当于做了两步. 1. 判断是否可以参考 2. 如果不参考，直接利用模型信息回答？ BTW, 这个paper里面信息量太少了..