amazon-dsstne
amazon-dsstne copied to clipboard
config in benchmark example
"Layers" : [ { "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "gl_input", "Sparse" : true }, { "Name" : "Hidden1", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : "Input", "N" : 1024, "Activation" : "Sigmoid", "Sparse" : false, "pDropout" : 0.5, "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01 } }, { "Name" : "Hidden2", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : ["Hidden1"], "N" : 1024, "Activation" : "Sigmoid", "Sparse" : false, "pDropout" : 0.5, "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01 } }, { "Name" : "Hidden3", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : ["Hidden2"], "N" : 1024, "Activation" : "Sigmoid", "Sparse" : false, "pDropout" : 0.5, "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01 } }, { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "gl_output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true , "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01, "Bias" : -10.2 }} ],
There are five layers totally. The first layer Sparse is true, and the output layer Sparse is true. I am wondering why the output data is sparse. For the each userid, I guess the output is vector(27278 dimensions), of the score on each movie and the score is float. So, it seems the output data is dense if I get scores of all movies. Could you give me some advice on this? Thank you!
Sparse=true is a flag to enable Sparseness Penalty on the Layer . Not marking the layer as Sparse. The data in the layer as Sparse is automatically detected by the DSSTNE Engine.
In the Standard config we have not provided the Sparseness Penalty
https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf Check from Page 14 which describes more about the Sparsness Penalty and other details
#Input layer definition "Sparse" : <Boolean>, # Indicates whether layer is sparse (default false)
Output layer definition
"Sparse" : <Boolean>, # Indicates whether layer is sparse (default false) "SparsenessPenalty" : <Boolean> # Indicates whether sparseness penalty should be applied (default false)
It seems that in the benchmark config.json, the output layer is sparse, and SparsenessPenalty is default value false. I am wondering if the output layer is dense or sparse Thank you!
So this is not quite right and I will soon be submitting a fix for this. Currently, the Sparseness penalty is global.
The only piece here that is active is labelling a layer as sparse. This automagically applies the global sparseness penalty to this layer. SparsenessPenalty will be modified shortly to allow you to override the global penalty parameters on a per-layer basis.
@scottlegrand I tried to follow the benchmark guide but I'm not quite sure the autoencoder.py takes data input in which format. In the instructions it says -f ml-20all.remotcc which does not correspond to anything that I could imagine. Kindly advise, thanks in advance
You should download the file from https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/data/ml20m-all
wget https://s3-us-west-2.amazonaws.com/amazon-dsstne-samples/data/ml20m-all
@rgeorgej so you mean we are using that file directly as the input for the autoencoder.py? I tried that and got segmentation fault when running the benchmark. In particular, i run this on the EC2 (g2.2xlarge), with Cuda driver upgraded to 7.5, TF 0.9: autoencoder.py -u 1024 -b 256 -i 1082 -v54 --vocab_size 27278 -l 3 -f ml20m-all
This gave me
python autoencoder.py -u 1024 -b 256 -i 1082 -v54 --vocab_size 27278 -l 3 -f ml20m-all I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally Loading datasets Segmentation fault (core dumped)
We have used Tensorflow version 0.7 to evaluate this with Cuda driver version 7.0.
I think there might be an issue with tensorflow 0.9 as it cannot even detect gpus on the same machine on which I once ran tensorflow 0.8.
Over a year old - Tensorflow has moved on. But for the first part, you had a flag to fix the sparseness penalty - has it been pushed?