enas Expected performance

Expected performance

Open bkj opened this issue 6 years ago • 12 comments

If we run the three experiments from the README:

# Exp. 1
./scripts/ptb_search.sh
./scripts/ptb_final.sh

# Exp. 2
./scripts/cifar10_macro_search.sh
./scripts/cifar10_macro_final.sh

# Exp 3.
./scripts/cifar10_micro_search.sh
./scripts/cifar10_micro_final.sh

what should we expect the final performance metrics to be? Are you able to post the expected results either here or in the README?

Thanks

Apr 03 '18 15:04 bkj

# Exp. 1
./scripts/ptb_search.sh  # should give you a bunch of architectures
./scripts/ptb_final.sh   # should give you around 55.8 test perplexity on PTB

# Exp. 2
./scripts/cifar10_macro_search.sh  # should give you a bunch of architectures
./scripts/cifar10_macro_final.sh   # should give you around 96.1% accuracy on the test set

# Exp 3.
./scripts/cifar10_micro_search.sh  # should give you a bunch of architectures
./scripts/cifar10_micro_final.sh   # should give you around 96.5% accuracy on the test set

Apr 03 '18 17:04 hyhieu

Fantastic -- thank you. I'm running the cifar10_micro_search.sh now, and will post here to confirm once I get some results.

~ Ben

Apr 03 '18 17:04 bkj

OK -- tail of cifar10_micro_search.sh looks like:

Eval at 42018
valid_accuracy: 0.6820
Eval at 42018
test_accuracy: 0.6636
epoch=149   ch_step=42050  loss=0.910298 lr=0.0005   |g|=2.4888   tr_acc=105/160 mins=717.36    
epoch=149   ch_step=42100  loss=1.008317 lr=0.0005   |g|=3.0906   tr_acc=110/160 mins=717.86    
epoch=149   ch_step=42150  loss=0.833895 lr=0.0005   |g|=2.0674   tr_acc=107/160 mins=718.36    
epoch=149   ch_step=42200  loss=0.951047 lr=0.0005   |g|=2.4366   tr_acc=104/160 mins=718.85    
epoch=149   ch_step=42250  loss=0.930920 lr=0.0005   |g|=2.1964   tr_acc=107/160 mins=719.35    
epoch=150   ch_step=42300  loss=0.993480 lr=0.0005   |g|=2.3855   tr_acc=98 /160 mins=719.85    
Epoch 150: Training controller
ctrl_step=4470   loss=3.077   ent=53.16 lr=0.0035 |g|=0.0440   acc=0.7375 bl=0.68  mins=719.85
ctrl_step=4475   loss=1.252   ent=53.17 lr=0.0035 |g|=0.0088   acc=0.7000 bl=0.68  mins=720.05
ctrl_step=4480   loss=2.096   ent=53.14 lr=0.0035 |g|=0.0490   acc=0.7188 bl=0.68  mins=720.26
ctrl_step=4485   loss=3.848   ent=53.13 lr=0.0035 |g|=0.0474   acc=0.7625 bl=0.69  mins=720.46
ctrl_step=4490   loss=2.683   ent=53.13 lr=0.0035 |g|=0.1009   acc=0.7375 bl=0.69  mins=720.66
ctrl_step=4495   loss=-5.616  ent=53.09 lr=0.0035 |g|=0.1095   acc=0.5750 bl=0.69  mins=720.86
Here are 10 architectures
[0 1 1 0 1 1 0 3 1 2 0 0 0 1 1 4 1 0 3 1]
[0 1 0 1 1 0 1 1 0 2 2 3 2 1 3 4 4 0 5 1]
val_acc=0.6375
--------------------------------------------------------------------------------
[0 1 1 1 0 2 0 1 1 0 1 3 0 1 1 4 0 1 0 4]
[0 2 0 4 0 0 0 3 0 2 3 4 0 0 2 1 5 1 2 1]
val_acc=0.6313
--------------------------------------------------------------------------------
[0 2 0 0 1 1 1 1 1 2 0 0 0 0 1 0 1 3 0 0]
[0 1 0 4 0 1 1 2 0 2 3 4 1 0 3 4 5 4 4 1]
val_acc=0.7000
--------------------------------------------------------------------------------
[1 2 1 2 0 1 1 1 0 0 1 1 1 0 1 0 1 1 0 4]
[1 1 0 2 2 0 2 4 3 1 3 1 4 4 4 0 2 3 0 0]
val_acc=0.6875
--------------------------------------------------------------------------------
[1 2 0 2 1 4 1 2 1 3 0 4 0 1 2 4 0 4 1 4]
[1 0 1 0 0 2 0 2 1 4 2 4 4 3 1 0 5 0 5 3]
val_acc=0.6500
--------------------------------------------------------------------------------
[1 3 1 0 1 4 0 1 2 1 1 4 1 4 1 1 0 1 1 1]
[1 1 0 0 2 1 1 4 0 0 1 4 2 1 0 0 5 0 0 2]
val_acc=0.7000
--------------------------------------------------------------------------------
[1 3 1 1 1 4 0 4 1 0 0 0 0 2 1 4 0 0 0 1]
[1 0 1 4 0 1 1 4 1 1 1 0 4 1 2 3 4 0 3 1]
val_acc=0.7375
--------------------------------------------------------------------------------
[0 4 0 1 1 1 0 0 0 0 0 1 0 0 0 4 0 2 1 0]
[1 4 1 0 2 1 2 4 3 4 0 0 3 3 2 4 5 2 2 1]
val_acc=0.6188
--------------------------------------------------------------------------------
[1 1 1 0 1 2 0 2 0 2 1 0 1 4 4 0 1 0 0 4]
[0 2 1 1 0 0 1 1 1 0 0 2 1 4 2 0 4 3 0 4]
val_acc=0.6500
--------------------------------------------------------------------------------
[0 2 0 0 1 3 1 2 1 0 0 0 0 0 0 1 0 0 1 0]
[1 3 0 1 0 4 2 0 1 0 2 2 2 3 0 0 5 2 2 0]
val_acc=0.6625
--------------------------------------------------------------------------------
Epoch 150: Eval
Eval at 42300
valid_accuracy: 0.6796
Eval at 42300
test_accuracy: 0.6660

Took 12 hours on a Titan X PASCAL, as advertised.

Now I think I'm supposed to take the architecture w/ the best validation, which is:

[1 3 1 1 1 4 0 4 1 0 0 0 0 2 1 4 0 0 0 1]
[1 0 1 4 0 1 1 4 1 1 1 0 4 1 2 3 4 0 3 1]
val_acc=0.7375

Is that right? Or is there somewhere where a large number of architectures are validated and an optimal one is chosen for me?

~ Ben

EDIT: Here's a plot of the valid and test accuracies over time: screen shot 2018-04-04 at 10 04 13 am These are the numbers logged like:

Eval at 42300
valid_accuracy: 0.6796
Eval at 42300
test_accuracy: 0.6660

Apr 04 '18 13:04 bkj

Thank you for dedicating the time and resource to run and verify our code.

We also looked at the architectures that were sampled in the time steps before and took the one with the overall best val_acc. However, I think the one you picked might work well too 😄

Apr 04 '18 13:04 hyhieu

Here's a plot of the test accuracy in cifar10_micro_final.sh: screen shot 2018-04-06 at 8 55 29 am

This used architectures:

fixed_arc="1 3 1 1 1 4 0 4 1 0 0 0 0 2 1 4 0 0 0 1"
fixed_arc="$fixed_arc 1 0 1 4 0 1 1 4 1 1 1 0 4 1 2 3 4 0 3 1"

w/ all other parameters unchanged.

Final test accuracy of 0.9612 at epoch 630 (w/ maximum test accuracy of 0.9620 at epoch 619). Also, got to 0.9611 accuracy at epoch 306 -- so the extra 300 iterations don't give a whole lot of improvement.

Note this final model takes > 1 day to train -- longer than the initial architecture search.

For me, this prompts the question of how much of the difference between methods reported in the paper is due to the hyperparameters of the final retraining step vs the discovered architecture. It'd be interesting to train a standard ResNet architecture w/ the same parameters as cifar10_micro_final.sh to see how it compares.

Apr 05 '18 17:04 bkj

@bkj ，Hello.Is everything going well with this work(the macro search space, the micro search space)? And I want to konw how to get the following architecture in the cifar10_macro_final.sh: fixed_arc="0" fixed_arc="$fixed_arc 3 0" fixed_arc="$fixed_arc 0 1 0" fixed_arc="$fixed_arc 2 0 0 1" fixed_arc="$fixed_arc 2 0 0 0 0" fixed_arc="$fixed_arc 3 1 1 0 1 0" fixed_arc="$fixed_arc 2 0 0 0 0 0 1" fixed_arc="$fixed_arc 2 0 1 1 0 1 1 1" fixed_arc="$fixed_arc 1 0 1 1 1 0 1 0 1" fixed_arc="$fixed_arc 0 0 0 0 0 0 0 0 0 0" fixed_arc="$fixed_arc 2 0 0 0 0 0 1 0 0 0 0" fixed_arc="$fixed_arc 0 1 0 0 1 1 0 0 0 0 1 1" fixed_arc="$fixed_arc 2 0 1 0 0 0 0 0 1 0 1 1 0" fixed_arc="$fixed_arc 1 0 0 1 0 0 0 1 1 1 0 1 0 1" fixed_arc="$fixed_arc 0 1 1 0 1 0 1 0 0 0 0 0 1 0 0" fixed_arc="$fixed_arc 2 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1" fixed_arc="$fixed_arc 2 0 1 0 0 0 1 0 0 1 1 1 1 0 0 1 0" fixed_arc="$fixed_arc 2 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1" fixed_arc="$fixed_arc 3 0 1 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0" fixed_arc="$fixed_arc 3 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1" fixed_arc="$fixed_arc 0 1 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0" fixed_arc="$fixed_arc 3 0 1 0 1 1 0 0 1 0 1 1 0 1 1 0 1 0 0 1 0 0" fixed_arc="$fixed_arc 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0" fixed_arc="$fixed_arc 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 0 1 0 0 1 1 0 0 0", which has 24 cells. while ,I just can get architectures like ： [1] [1 1] [5 0 0] [5 0 0 0] [0 0 1 1 0] [1 1 0 0 0 0] [1 1 0 1 1 1 0] [3 0 0 1 0 1 1 1] [5 0 0 1 0 0 1 0 0] [1 1 1 0 0 0 0 1 0 0] [0 1 1 0 0 0 0 1 1 1 1] [0 0 1 1 1 1 0 1 0 0 1 1], which only has 12 cells.

Apr 18 '18 13:04 axiniu

@bkj Hello, Did you change any parameters on the scripts to get this result. I only could get almost 0.88 acc on cifar using micro structure.

Apr 22 '18 06:04 lianqing11

@bkj Hi,I saw you you also are interested in the ENAS-pytorch work in https ://github.com/carpedm20/ENAS-pytorch. When I run the ENAS-pytorch code by : python main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1,I met many errors.What I want to do is to find some cnn architectures and make them visulized. Would you please tell me what changes Ishould do to the code before I run it? Thanks for your reply.

Apr 30 '18 09:04 axiniu

Hi @hyhieu , I have run the latest default ./scripts/ptb_final.sh for 600+ epochs, but the ppl remains 69+. May I know what is the expected number of epochs needed to reproduce claimed 55.8 ppl? Or do I need to change ./scripts/ptb_final.sh configuration?

May 29 '18 20:05 Chen-Hailin

OK -- tail of cifar10_micro_search.sh looks like:

Eval at 42018
valid_accuracy: 0.6820
Eval at 42018
test_accuracy: 0.6636
epoch=149   ch_step=42050  loss=0.910298 lr=0.0005   |g|=2.4888   tr_acc=105/160 mins=717.36    
epoch=149   ch_step=42100  loss=1.008317 lr=0.0005   |g|=3.0906   tr_acc=110/160 mins=717.86    
epoch=149   ch_step=42150  loss=0.833895 lr=0.0005   |g|=2.0674   tr_acc=107/160 mins=718.36    
epoch=149   ch_step=42200  loss=0.951047 lr=0.0005   |g|=2.4366   tr_acc=104/160 mins=718.85    
epoch=149   ch_step=42250  loss=0.930920 lr=0.0005   |g|=2.1964   tr_acc=107/160 mins=719.35    
epoch=150   ch_step=42300  loss=0.993480 lr=0.0005   |g|=2.3855   tr_acc=98 /160 mins=719.85    
Epoch 150: Training controller
ctrl_step=4470   loss=3.077   ent=53.16 lr=0.0035 |g|=0.0440   acc=0.7375 bl=0.68  mins=719.85
ctrl_step=4475   loss=1.252   ent=53.17 lr=0.0035 |g|=0.0088   acc=0.7000 bl=0.68  mins=720.05
ctrl_step=4480   loss=2.096   ent=53.14 lr=0.0035 |g|=0.0490   acc=0.7188 bl=0.68  mins=720.26
ctrl_step=4485   loss=3.848   ent=53.13 lr=0.0035 |g|=0.0474   acc=0.7625 bl=0.69  mins=720.46
ctrl_step=4490   loss=2.683   ent=53.13 lr=0.0035 |g|=0.1009   acc=0.7375 bl=0.69  mins=720.66
ctrl_step=4495   loss=-5.616  ent=53.09 lr=0.0035 |g|=0.1095   acc=0.5750 bl=0.69  mins=720.86
Here are 10 architectures
[0 1 1 0 1 1 0 3 1 2 0 0 0 1 1 4 1 0 3 1]
[0 1 0 1 1 0 1 1 0 2 2 3 2 1 3 4 4 0 5 1]
val_acc=0.6375
--------------------------------------------------------------------------------
[0 1 1 1 0 2 0 1 1 0 1 3 0 1 1 4 0 1 0 4]
[0 2 0 4 0 0 0 3 0 2 3 4 0 0 2 1 5 1 2 1]
val_acc=0.6313
--------------------------------------------------------------------------------
[0 2 0 0 1 1 1 1 1 2 0 0 0 0 1 0 1 3 0 0]
[0 1 0 4 0 1 1 2 0 2 3 4 1 0 3 4 5 4 4 1]
val_acc=0.7000
--------------------------------------------------------------------------------
[1 2 1 2 0 1 1 1 0 0 1 1 1 0 1 0 1 1 0 4]
[1 1 0 2 2 0 2 4 3 1 3 1 4 4 4 0 2 3 0 0]
val_acc=0.6875
--------------------------------------------------------------------------------
[1 2 0 2 1 4 1 2 1 3 0 4 0 1 2 4 0 4 1 4]
[1 0 1 0 0 2 0 2 1 4 2 4 4 3 1 0 5 0 5 3]
val_acc=0.6500
--------------------------------------------------------------------------------
[1 3 1 0 1 4 0 1 2 1 1 4 1 4 1 1 0 1 1 1]
[1 1 0 0 2 1 1 4 0 0 1 4 2 1 0 0 5 0 0 2]
val_acc=0.7000
--------------------------------------------------------------------------------
[1 3 1 1 1 4 0 4 1 0 0 0 0 2 1 4 0 0 0 1]
[1 0 1 4 0 1 1 4 1 1 1 0 4 1 2 3 4 0 3 1]
val_acc=0.7375
--------------------------------------------------------------------------------
[0 4 0 1 1 1 0 0 0 0 0 1 0 0 0 4 0 2 1 0]
[1 4 1 0 2 1 2 4 3 4 0 0 3 3 2 4 5 2 2 1]
val_acc=0.6188
--------------------------------------------------------------------------------
[1 1 1 0 1 2 0 2 0 2 1 0 1 4 4 0 1 0 0 4]
[0 2 1 1 0 0 1 1 1 0 0 2 1 4 2 0 4 3 0 4]
val_acc=0.6500
--------------------------------------------------------------------------------
[0 2 0 0 1 3 1 2 1 0 0 0 0 0 0 1 0 0 1 0]
[1 3 0 1 0 4 2 0 1 0 2 2 2 3 0 0 5 2 2 0]
val_acc=0.6625
--------------------------------------------------------------------------------
Epoch 150: Eval
Eval at 42300
valid_accuracy: 0.6796
Eval at 42300
test_accuracy: 0.6660

Took 12 hours on a Titan X PASCAL, as advertised.

Now I think I'm supposed to take the architecture w/ the best validation, which is:

[1 3 1 1 1 4 0 4 1 0 0 0 0 2 1 4 0 0 0 1]
[1 0 1 4 0 1 1 4 1 1 1 0 4 1 2 3 4 0 3 1]
val_acc=0.7375

Is that right? Or is there somewhere where a large number of architectures are validated and an optimal one is chosen for me?

~ Ben

EDIT: Here's a plot of the valid and test accuracies over time: screen shot 2018-04-04 at 10 04 13 am These are the numbers logged like:

Eval at 42300
valid_accuracy: 0.6796
Eval at 42300
test_accuracy: 0.6660

so after you get the arch, have you retrained from scratch to get the acc of 96.5%?

Nov 22 '18 04:11 Moran232

@axiniu I met the same problem,Has the problem been solved now?Thanks a lot

Mar 07 '19 02:03 upwindflys

@upwindflys @axiniu I also just get the same 12 cell but my macro search get very high accuracy like

[2]
[3 0]
[5 1 0]
[5 0 0 1]
[2 0 0 0 1]
[1 0 0 0 0 0]
[5 1 0 1 0 0 0]
[1 0 0 0 1 0 0 0]
[1 0 0 0 0 1 0 0 0]
[5 0 1 0 0 0 1 1 0 1]
[4 0 0 0 1 0 1 0 1 0 0]
[1 0 0 0 0 0 0 1 1 0 1 1]
val_acc=0.8125
--------------------------------------------------------------------------------
Epoch 310: Eval
Eval at 109120
valid_accuracy: 0.8154
Eval at 109120
test_accuracy: 0.8080

But did you know the difference between 12 and 24 cell?

Oct 03 '19 02:10 chg0901

enas enas copied to clipboard

Expected performance

enas
enas copied to clipboard