Zach Nussbaum
Zach Nussbaum
Where are the negative samples even used for Bert? I don't see it referenced anywhere in the code other than being initialized.
How is the sampling so much different? ```For easy and fair evaluation, we follow the common strategy in [12, 22, 49], pairing each ground truth item in the test set...
`conda create -n pluribus` `conda activate pluribus` `pip install poker_ai` `poker_ai` I don't have a ton of experience with conda, but still getting the same error when doing above.
hm i'll test it out. either way, i'm excited to try it out!
Thanks Edward for the suggestions! Quick clarification, can you expand on what you mean by `reason if it's expected`? Are there situations where that wouldn't be the case? And yes...
@shjwudp I interpreted that chart as one that showed the benefits of muP given that at increasing depth, the HPs do actually provide better performance whereas SP has a shifting...
@edwardjhu thanks for this explanation, I think I missed this part in the paper but makes intuitive sense to me. Is there section in the paper that describes how `mup`...
@edwardjhu I did not realize the keys of the models dict was their width, so the plots look a little different when I make that change `mup`  `sp` ...
ah thanks so much! I missed that in my first few passes
@edwardjhu this is a somewhat silly question but wanted to double check. When we are transferring parameters, should we retain the `mup` Readout Layers or resort to the SP layers...