hackiey
hackiey
Thanks for your answers! Now I have a follow-up question, if you have two samples A of shape (2, 10, 1) , for lstm controller, the input will become B...
Thanks again. I understand that the NTM layer will start with freshly initialized state, but the lstm controller may not, cause the lstm controller is stateful, the state will pass...
Maybe you should write some lstm codes in ntm step function 🤕, I found that in another ntm keras implementation. I think this is one drawback of keras, it is...
Here, https://github.com/SigmaQuan/NTM-Keras/blob/master/ntm.py, you can see they don't use lstm layer, and wirte them by themselves. Use stateful lstm and just one length is not a good idea([here is a keras...
@BangLiu I have implemented QAnet with EMA, also mostly based the Tensorflow implementation, the performance is **em: 67.317** and **f1: 76.953** (without EMA), **em: 70.155** and **f1: 79.432** (with EMA)...
把assistant 的tool_calls拼进messages的目的是告诉gpt自己调用过哪些tools以及调用时传入的参数,理论上能避免工具调用失败时多次重复调用。 另外role=function也是deprecated状态了。
官方案例中function_call和tool_calls的最大区别是,tool_calls会把调用信息当做一条message放入上下文中,相比于function_call会多一条调用参数的信息。 如果function调用失败,function_call的messages是 ```json [ {"role": "user", "content": "http://www.example.com里有什么内容"}, {"role": "function", "name": "web_crawler", "content": ""}, {"role": "function", "name": "web_crawler", "content": ""}, {"role": "function", "name": "web_crawler", "content": ""}, {"role": "function", "name": "web_crawler",...
触发概率还是挺高的,或者不改function name,把system prompt改一改应该也可以 
flash_attn==2.4.2 torch==2.0.1+cu117 import model的时候已经没有未安装flash attention的warning了,pytorch需要升级吗? 还想了解下,如果这些都正常工作,14B的全量finetune应该占用的显存大小是多少,readme里面没有写,7b只写了1024的,感觉不太好对比
用的transformers的trainer,auto frompretrain,就能正常跑了。 升级pytorch2.1.1可以不用flash attention吗,代码里是自动适应的吗