qq31415926

Results 5 issues of qq31415926

全连接层初始化代码 ` bias = np.sqrt(6.0 / (input_linear.weight.size(0) + input_linear.weight.size(1))) nn.init.uniform_(input_linear.weight, -bias, bias) if input_linear.bias is not None: input_linear.bias.data.zero_()` lstm层初始化代码 ` for ind in range(0, input_lstm.num_layers): weight = eval('input_lstm.weight_ih_l' + str(ind))...

请问实验环境使用了几张什么型号的显卡?

Could you give more details about ensemble baselines (i.e. Top1-LLM-Blender & Top1-PPL 162B)? What large language models do you choose to compose ensemble learning?