hackergame2019-writeups 海底捞针官方题解round与fit

生成均值图像用的是np.ndarray.astype(np.uint8)，做的是向下取整。而题解中却用round四舍五入来解释 https://github.com/ustclug/hackergame2019-writeups/blob/6fbe69e0666bdb9f16a375b6cfb5395aafcefd1c/official/2077_%E6%B5%B7%E5%BA%95%E6%8D%9E%E9%92%88/src/generator.py#L9 https://github.com/ustclug/hackergame2019-writeups/blame/6fbe69e0666bdb9f16a375b6cfb5395aafcefd1c/official/2077_%E6%B5%B7%E5%BA%95%E6%8D%9E%E9%92%88/README.md#L20-L25
题目中拟合的形式是y=k0*x0+k1*x1+...+k49999*x49999，不存在偏置项b，那么用下面不带偏置项的回归器应该更科学一点

reg = linear_model.Lasso(alpha=1, positive=True, fit_intercept=False)

但是，实际效果却变差了很多，通不过官方给的测试。运行一个更弱的条件，发现有四张正确图片的系数是0

[x for x in choices if x not in np.argwhere(reg.coef_>0).reshape(-1)] # [4303, 24496, 36462, 39326]

这真是一个玄学题目😂

Oct 23 '19 06:10 0rzx

对，的确是floor，不是round。。。然而不知道为啥我写题解时却发现行为和round一样。。。惊出了我一身冷汗。。。这样我就可以放心地改回去了

Oct 23 '19 07:10 suquark

@0rzx 这里还有一个比较科学的解释：

在题目的生成源代码里面运行：

print((np.mean(targets, axis=0).astype(np.uint8) - np.mean(targets, axis=0)).mean())
print(images.mean() - images[choices].mean())

会得到

-0.47768702651515166
2.2311801805161053

虽然由于取整，带来了一个大约是 0.5 的负的 bias，但是由于样本选择问题，导致平均像素值低了2.2311801805161053，后者对于50000个样本而言是一个巨大的bias。如果此时intercept设为0，那么显然 Lasso 不得不将这个 bias 加权到其它图片上面，因而不能得到解，甚至导致正确答案的权重严重偏低。

Oct 23 '19 07:10 suquark

hackergame2019-writeups
hackergame2019-writeups copied to clipboard

海底捞针官方题解round与fit_intercept

hackergame2019-writeups hackergame2019-writeups copied to clipboard

海底捞针官方题解round与fit_intercept

hackergame2019-writeups
hackergame2019-writeups copied to clipboard