Haoxuan (Horace) Wang comments

Results 5 comments of


                                            Haoxuan (Horace) Wang

把代码部分还原到原始状态

另外想问下是用GPT-4/ChatGPT翻译的吗？如果是的话，可否在prompt里加入不翻译代码的部分。另外我看了下这个大概是原始数据1/2的量，其它部分还会继续翻译吗？

把代码部分还原到原始状态

> Hi 非常感谢翻译数据集，我看了一下有个问题就是代码都被“翻译”了。所以我用下面的关键词搜索一下，花了一个下午手动把不太对的代码翻译都改回去了。当然也许会有遗漏。 > > * 代码 > * 函数 > * 程序 > * 脚本 > * Python > * 蟒蛇 > * Go > * C++ > *...

Training suggestion...? For reducing LLM to produce like "I am sorry, I'm an AI language model and I don't have abilty to transcribe speech to text"

Hi @ddlBoJack Thanks a lot for replying! One thing I do observe - we have many no-speech audio, or very short audio segment is that in our testing dataset. A...

Training suggestion...? For reducing LLM to produce like "I am sorry, I'm an AI language model and I don't have abilty to transcribe speech to text"

Hi @PigeonDan1 1. prompt + Musan dataset with such an label seem helpful in controlling the format of output for audio without speech, or audio with very few speech. We...

Training suggestion...? For reducing LLM to produce like "I am sorry, I'm an AI language model and I don't have abilty to transcribe speech to text"

@fclearner Thanks for your question. It seems that it can happen that it has multiple decoded depends on the LLM's instruction following capability. But the thing is that we will...