hackiey comments

Results 11 comments of


                                            hackiey

_run_controller about lstm

Thanks for your answers! Now I have a follow-up question, if you have two samples A of shape (2, 10, 1) , for lstm controller, the input will become B...

_run_controller about lstm

Thanks again. I understand that the NTM layer will start with freshly initialized state, but the lstm controller may not, cause the lstm controller is stateful, the state will pass...

_run_controller about lstm

Maybe you should write some lstm codes in ntm step function 🤕, I found that in another ntm keras implementation. I think this is one drawback of keras, it is...

Here, https://github.com/SigmaQuan/NTM-Keras/blob/master/ntm.py, you can see they don't use lstm layer, and wirte them by themselves. Use stateful lstm and just one length is not a good idea([here is a keras...

GPU memory explode after 3 steps

@BangLiu I have implemented QAnet with EMA, also mostly based the Tensorflow implementation, the performance is **em: 67.317** and **f1: 76.953** (without EMA), **em: 70.155** and **f1: 79.432** (with EMA)...

[Request] tool_calls改为role=tool

把assistant 的tool_calls拼进messages的目的是告诉gpt自己调用过哪些tools以及调用时传入的参数，理论上能避免工具调用失败时多次重复调用。另外role=function也是deprecated状态了。

[Request] tool_calls改为role=tool

官方案例中function_call和tool_calls的最大区别是，tool_calls会把调用信息当做一条message放入上下文中，相比于function_call会多一条调用参数的信息。如果function调用失败，function_call的messages是 ```json [ {"role": "user", "content": "http://www.example.com里有什么内容"}, {"role": "function", "name": "web_crawler", "content": ""}, {"role": "function", "name": "web_crawler", "content": ""}, {"role": "function", "name": "web_crawler", "content": ""}, {"role": "function", "name": "web_crawler",...

hackiey

_run_controller about lstm

_run_controller about lstm

_run_controller about lstm

_run_controller about lstm

GPU memory explode after 3 steps

[Request] tool_calls改为role=tool

[Request] tool_calls改为role=tool

[Bug] Dalle3等内置工具function name问题

[BUG] <title> Qwen-14B全量微调的显存使用量非常高

[BUG] <title> Qwen-14B全量微调的显存使用量非常高