WAGE
WAGE copied to clipboard
Training Loss NAN
Hi, I tried to reproduce your experiment with Cifar10, but I got training loss NaN. I am using a four GPUs machine with tensorflow-gpu 1.12 for the experiment.
Here is the Option I used, and I only modified the saveModel
import time
import tensorflow as tf
debug = False
Time = time.strftime('%Y-%m-%d', time.localtime())
Notes = 'vgg7_2888'
# Notes = 'temp'
GPU = [0]
batchSize = 128
dataSet = 'CIFAR10'
loadModel = None
# loadModel = '../model/' + '2017-12-06' + '(' + 'vgg7 2888' + ')' + '.tf'
# saveModel = None
saveModel = '../model/' + Time + '_' + Notes + '.tf'
bitsW = 2 # bit width of we ights
bitsA = 8 # bit width of activations
bitsG = 8 # bit width of gradients
bitsE = 8 # bit width of errors
bitsR = 16 # bit width of randomizer
lr = tf.Variable(initial_value=0., trainable=False, name='lr', dtype=tf.float32)
lr_schedule = [0, 8, 200, 1,250,1./8,300,0]
L2 = 0
lossFunc = 'SSE'
# lossFunc = tf.losses.softmax_cross_entropy
optimizer = tf.train.GradientDescentOptimizer(1) # lr is controlled in Quantize.G
# optimizer = tf.train.MomentumOptimizer(lr, 0.9, use_nesterov=True)
# shared variables, defined by other files
seed = None
sess = None
W_scale = []
WAGE Folder structure.
.
|-- README.md
|-- dataSet
| |-- CIFAR10.npz
| |-- CIFAR10.py
| |-- cifar-10-batches-py
| | |-- batches.meta
| | |-- data_batch_1
| | |-- data_batch_2
| | |-- data_batch_3
| | |-- data_batch_4
| | |-- data_batch_5
| | |-- readme.html
| | `-- test_batch
| `-- cifar-10-python.tar.gz
|-- log
| |-- 2018-01-30(vgg7\ 2888).txt
| |-- 2021-09-14(temp).txt
| `-- 2021-09-14(vgg7_2888).txt
|-- model
`-- source
|-- Log.py
|-- Log.pyc
|-- NN.py
|-- NN.pyc
|-- Option.py
|-- Option.pyc
|-- Quantize.py
|-- Quantize.pyc
|-- Top.py
|-- getData.py
|-- getData.pyc
|-- myInitializer.py
`-- myInitializer.pyc