Question about AutoEncoders
Hi Joao
In a denoising auto-encoder
I have:
NeuralFit.FitLoading(NNAutoencoder, {EpochSize=}SETrainSize.Value, 0, 0, {Batch=}SEBatch.Value, {Epochs=}SEEpoch.Value, @GetTrainingData, nil, nil);
I suppose that in your example {EpochSize} (must be read {train size} ?)
If Randomly get one image from the whole set of training images and set SETrainSize value to 100
Mean that the NN takes 100 images for each Epoch?
` procedure TDenoizingForm.GetTrainingData(Idx: integer; ThreadId: integer; pInput, pOutput: TNNetVolume); var ImageId : integer; begin ImageId := Random(ImageVolumes.Count); pOutput.Copy(ImageVolumes[ImageId]);
pInput.Copy(pOutput); pInput.AddGaussianNoise(FSNoise.Value); end; ` Thank you.
B->
Hello @Dzandaa , I hope that you are doing well.
Regarding "I suppose that in your example {EpochSize} (must be read {train size} ?)", you are correct. It's the number of samples in your training set. If you were using the CIFAR-10 dataset, it would be 50000.
If you have 100 samples, it will be 100.
I also have passion for autoenconders. In the case that your source code is open, feel free to publicize!
In the case that I haven't replied to your question, feel free to ask for further clarification.
Kind regards, JP.
Hi Joao,
Thank you.
We are testing CAI vs Python, for speed, memory consumption, ease of installation.
Work in progress :)
B->
Hi Joao in TNNetGet2VolumesProc = procedure(Idx: integer; ThreadId: integer; pInput, pOutput: TNNetVolume) of object
What is the meaning of Idx?
And This is my GetTrainingData Code for a Denoiser:
// *************************** // ***** Get Training Data *** // *************************** procedure TDenoizingForm.OnGetTrainingData(Idx: integer; ThreadId: integer; pInput, pOutput: TNNetVolume); var ImageId : integer; begin
ImageId := Random(ImageVolumes.Count); // here I get a random Value from then whole Dataset pOutput.Copy(ImageVolumes[ImageId]);
pInput.Copy(pOutput); pInput.AddGaussianNoise(FSNoise.Value);
end;
I don't use Valid and Test Volumes.
for a dataset of 256 values and a batch of 64, The number of Steps in one epoch is 4 (256/64) For example for a Dataset of 256 values, the first batch is from 0 to 63, the second 64 to 127, etc...
Is it right?
And in my case (getting values from the whole Dataset) what impact does it have on the network?
B->
@Dzandaa ,
The Idx parameter is the index of the sample inside of the thread ThreadId.
Most of times, I don't use Idx. I actually do the same as you do: a random position.
In CAI, the batch is then subdivided into threads. CAI benefits from large batches as the threading overhead is smaller.
@Dzandaa , In the case that you are interested in memory/flops efficient computing, I recommend having a look at: https://paperswithcode.com/paper/an-enhanced-scheme-for-reducing-the
Only if you love the above link, you can then look at: https://www.researchgate.net/publication/365687628_Effective_Approaches_for_Improving_the_Efficiency_of_Deep_Convolutional_Neural_Networks_for_Image_Classification
I tried this:
Cores: Integer; Cores := {$IFDEF UNIX}GetSystemThreadCount{$ELSE}GetCPUCount{$ENDIF};
NeuralFit.MaxThreadNum := Cores;
And add in Project->Options->Custom Options: -dHASTHREADS Is this relevant?
Also, what is the range of NeuralFit.CurrentTrainingError?
I try to use NeuralFit.TrainingAccuracy, but it is always Zero, I suppose I miss something.
B->
@Dzandaa,
In the file neuralnetwork.inc, you'll find {$DEFINE HASTHREADS} . So, {$DEFINE HASTHREADS} should be enabled by default.
Regarding the maximum number of cores (threads) to be used, I usually select the number of real cores not including the logic HT cores. This is the best scenario in most of my own experiments. It may be good to include the logic cores only on specific cases when the batch size is very large and the CPUs are paired with a GPU that is far from full capacity. Otherwise, I would stick with the real core count.
Example: a processor with 64 cores and 256 hyper threads. I would select 64 instead of 256.
Regarding the training accuracy, the problem is to define an "accurate prediction" for each sample. You can have a look at this example:
https://github.com/joaopauloschuler/neural-api/tree/master/examples/HypotenuseFitLoading
In the above example, you can find:
// Returns TRUE if difference is smaller than 0.1 .
function LocalFloatCompare(A, B: TNNetVolume; ThreadId: integer): boolean;
begin
Result := ( Abs(A.FData[0]-B.FData[0])<0.1 );
end;
...
NFit.InferHitFn := @LocalFloatCompare;
In this example, an accurate prediction is a prediction with up to 0.1 in absolute error.
Regarding the error, this is how it's calculated:
CurrentError := vOutput.SumDiff( pOutput );
There is no upper limit to it.
Thank you very much for the explanations.
for speed and memory comparison, I try to adapt this Autoencoder to PyTorch.
// Encoder
NNAutoencoder.AddLayer([
TNNetInput.Create(XSize, YSize, ZSize),
TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
TNNetMaxPool.Create(2, 2, 0),
TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
TNNetMaxPool.Create(2, 2, 0),
// Decoder
TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
TNNetUpsample.Create(),
TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
TNNetUpsample.Create(),
TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
TNNetConvolutionLinear.Create(ZSize, 3, 1, 1, 1),
TNNetReLUL.Create(-40, +40, 0) // Protection against overflow
]);
But not being a specialist in Python and Pytorch, I can't find the corresponding layers in Pytorch
I have this:
Encoder
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
nn.ReLU
nn.MaxPool2d(kernel_size=2, stride=2)
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
nn.ReLU
nn.MaxPool2d(kernel_size=2, stride=2)
Decoder
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
nn.ReLU
nn.Upsample(scale_factor=2)
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
nn.ReLU
nn.Upsample(scale_factor=2)
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
nn.ReLU
?????
Any help?
Sorry to bother you!!!
B->
I'll try to find time to do it.
Hi, My wife and I tested an autoencoder in CAI and Pytorch. On the same computer with GEForce GTX Titan XP 12 GB, Intel 10 cores and 128GB DDR4. Pytorch is 2 times faster.
We assume that this is due to the CUDA drivers.
On the other hand: without dataset, Pytorch + libraries: 5.2GB Difficult to implement on an embedded system Lazarus Release: 7MB B->
Many thanks for sharing!
At the time that you are benchmarking the speed, is this the training speed or the actual running?
In the case that you can share your PyTorch code, I would love to be able to repeat the benchmark at my end.
Hi, Yes it's the training speed. No problem to sharing our codes, but not on Github!!! You can contact us by sending a message on the Pascal Lazarus forum. B->