ENAS-pytorch
                                
                                
                                
                                    ENAS-pytorch copied to clipboard
                            
                            
                            
                        REINFORCE
It is clear that controller falls into a local optimal while it can't find better actions from REINFORCE. I think unknown c of c/valid ppl, moving average baseline and temperature of logits are what needed to be fixed. See more details (especially TODOs) in 497c2e717dc0087fea52d4f196d30543e4fb7512.