KennyNH
KennyNH
Hi. As can be seen in paper, the prior matching is a bi-level optimization problem. For params of encoder, we should maximize the objective, while we should minimize it for...
https://github.com/spcl/smoe/blob/249ef673d1929a23e5fe7c2628e1299b8c1c2e42/smoe/models/smoe_routing.py#L116 Why should "smoe_config.block_gate_grad" be set as "True" and let "grad_routing_weights=None" which cut the gradients of gating network? So how does the routing parameters in "SpatialLatentTensorGate2d" optimize?
Thanks for your implementation. There are 1209601 rows in 'WADI_14days.csv'. But why, in the paper of USAD, Table 1 shows that the length of training data of WADI is 1048517?
Thanks for your contribution. As a reference, could you provide the pickle files of the final plans of these three datasets? Hope for rely~
Thanks for your wonderful work "Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection". My question is that should we ensure the class balance for the demonstrations...