qq31415926
qq31415926
如题
全连接层初始化代码 ` bias = np.sqrt(6.0 / (input_linear.weight.size(0) + input_linear.weight.size(1))) nn.init.uniform_(input_linear.weight, -bias, bias) if input_linear.bias is not None: input_linear.bias.data.zero_()` lstm层初始化代码 ` for ind in range(0, input_lstm.num_layers): weight = eval('input_lstm.weight_ih_l' + str(ind))...
请问实验环境使用了几张什么型号的显卡?
Could you give more details about ensemble baselines (i.e. Top1-LLM-Blender & Top1-PPL 162B)? What large language models do you choose to compose ensemble learning?