DeepRL-Agents A3C Doom: What is the number of trainable variables?

Hi

I am trying to find how many trainable variables are there ...

When I try this code (just before: with tf.Session() as sess:)

np.sum([np.product([xi.value for xi in x.get_shape()]) for x in tf.trainable_variables()])

And this code

total_parameters = 0
for variable in tf.trainable_variables():
    shape = variable.get_shape()
    print(shape)
    variable_parametes = 1
    for dim in shape:
        variable_parametes *= dim.value
    total_parameters += variable_parametes
print(total_parameters)

I get the same answer (10794672)

However I faced two issues:

The answer depends on how many cpus I have, So I enforce the number of cpus = 1 in order to get reliable answer. Now the answer is 2398816
But it seems that the shared network is counted twice! See below:

(8, 8, 1, 16) (16,) (4, 4, 16, 32) (32,) (2592, 256) (256,) (512, 1024) (1024,) (256, 3) (256, 1) (8, 8, 1, 16) (16,) (4, 4, 16, 32) (32,) (2592, 256) (256,) (512, 1024) (1024,) (256, 3) (256, 1)

What is the accurate number of trainable variables?

Thank you

Mar 25 '17 17:03 IbrahimSobh

Hi Ibrahim,

I am unsure of what you mean by the shared network being counted twice. If you are using A3C with one worker, then there should be two network initialized, one global network and one worker network.

Mar 26 '17 18:03 awjuliani

Hi Arthur

I want to know exactly the number of trainable parameters ... I used this code

total_parameters = 0
for variable in tf.trainable_variables():
    shape = variable.get_shape()
    print(shape)
    variable_parametes = 1
    for dim in shape:
        variable_parametes *= dim.value
    total_parameters += variable_parametes
print(total_parameters)

and got this result above where this part is repeated twice:

(8, 8, 1, 16)
(16,)
(4, 4, 16, 32)
(32,)

My question is, how many trainable parameters are there in A3C code?

Thank you

Mar 26 '17 19:03 IbrahimSobh

The answer above accurately describes the total number of trainable parameters. But you need to keep into account that it is doubled the number of parameters actually ever being updated (or used for value and policy calculation).

Mar 26 '17 21:03 awjuliani

Thanks @awjuliani

Could you please comment on this elaboration to ensure correct understanding?

I understand that the following: 1- each worker get a copy of the global network 2- each worker interacts with its environment and calculates loss and gradients 3- each worker updates the global network using gradients (not its local network) 4- go to step 1

So, if we have only one worker, we will have 2 identical networks: 1- global (where actual parameters updates take place) 2- local (worker) where environment interactions, loss and gradients are calculated (no parameter update for local networks)

This is why number of parameters increases when number of workers increases (because each worker has local copy of the network)

And the actual number of parameters that are actually updates is 2398816 / 2 (Number of parameters when using one worker divided by 2)

I hope this is correct

Finally, I would like to thank you for your excellent article

Mar 26 '17 21:03 IbrahimSobh

DeepRL-Agents DeepRL-Agents copied to clipboard

A3C Doom: What is the number of trainable variables?

DeepRL-Agents
DeepRL-Agents copied to clipboard