Hi Joao

In a denoising auto-encoder

I have: NeuralFit.FitLoading(NNAutoencoder, {EpochSize=}SETrainSize.Value, 0, 0, {Batch=}SEBatch.Value, {Epochs=}SEEpoch.Value, @GetTrainingData, nil, nil);

I suppose that in your example {EpochSize} (must be read {train size} ?)

If Randomly get one image from the whole set of training images and set SETrainSize value to 100

Mean that the NN takes 100 images for each Epoch?

` procedure TDenoizingForm.GetTrainingData(Idx: integer; ThreadId: integer; pInput, pOutput: TNNetVolume); var ImageId : integer; begin ImageId := Random(ImageVolumes.Count); pOutput.Copy(ImageVolumes[ImageId]);

pInput.Copy(pOutput); pInput.AddGaussianNoise(FSNoise.Value); end; ` Thank you.

B->

Oct 31 '23 18:10 Dzandaa

Hello @Dzandaa , I hope that you are doing well.

Regarding "I suppose that in your example {EpochSize} (must be read {train size} ?)", you are correct. It's the number of samples in your training set. If you were using the CIFAR-10 dataset, it would be 50000.

If you have 100 samples, it will be 100.

I also have passion for autoenconders. In the case that your source code is open, feel free to publicize!

In the case that I haven't replied to your question, feel free to ask for further clarification.

Kind regards, JP.

Oct 31 '23 19:10 joaopauloschuler

Hi Joao,

Thank you.

We are testing CAI vs Python, for speed, memory consumption, ease of installation.

Work in progress :)

B->

Nov 01 '23 17:11 Dzandaa

Hi Joao in TNNetGet2VolumesProc = procedure(Idx: integer; ThreadId: integer; pInput, pOutput: TNNetVolume) of object

What is the meaning of Idx?

And This is my GetTrainingData Code for a Denoiser:

// *************************** // ***** Get Training Data *** // *************************** procedure TDenoizingForm.OnGetTrainingData(Idx: integer; ThreadId: integer; pInput, pOutput: TNNetVolume); var ImageId : integer; begin

ImageId := Random(ImageVolumes.Count); // here I get a random Value from then whole Dataset pOutput.Copy(ImageVolumes[ImageId]);

pInput.Copy(pOutput); pInput.AddGaussianNoise(FSNoise.Value);

end;

I don't use Valid and Test Volumes.

for a dataset of 256 values and a batch of 64, The number of Steps in one epoch is 4 (256/64) For example for a Dataset of 256 values, the first batch is from 0 to 63, the second 64 to 127, etc...

Is it right?

And in my case (getting values from the whole Dataset) what impact does it have on the network?

B->

Nov 03 '23 15:11 Dzandaa

@Dzandaa , The Idx parameter is the index of the sample inside of the thread ThreadId.

Most of times, I don't use Idx. I actually do the same as you do: a random position.

In CAI, the batch is then subdivided into threads. CAI benefits from large batches as the threading overhead is smaller.

Nov 04 '23 06:11 joaopauloschuler

@Dzandaa , In the case that you are interested in memory/flops efficient computing, I recommend having a look at: https://paperswithcode.com/paper/an-enhanced-scheme-for-reducing-the

Only if you love the above link, you can then look at: https://www.researchgate.net/publication/365687628_Effective_Approaches_for_Improving_the_Efficiency_of_Deep_Convolutional_Neural_Networks_for_Image_Classification

Nov 04 '23 07:11 joaopauloschuler

I tried this:

Cores: Integer; Cores := {$IFDEF UNIX}GetSystemThreadCount{$ELSE}GetCPUCount{$ENDIF};

NeuralFit.MaxThreadNum := Cores;

And add in Project->Options->Custom Options: -dHASTHREADS Is this relevant?

Also, what is the range of NeuralFit.CurrentTrainingError?

I try to use NeuralFit.TrainingAccuracy, but it is always Zero, I suppose I miss something.

B->

Nov 04 '23 17:11 Dzandaa

@Dzandaa, In the file neuralnetwork.inc, you'll find {$DEFINE HASTHREADS} . So, {$DEFINE HASTHREADS} should be enabled by default.

Regarding the maximum number of cores (threads) to be used, I usually select the number of real cores not including the logic HT cores. This is the best scenario in most of my own experiments. It may be good to include the logic cores only on specific cases when the batch size is very large and the CPUs are paired with a GPU that is far from full capacity. Otherwise, I would stick with the real core count.

Example: a processor with 64 cores and 256 hyper threads. I would select 64 instead of 256.

Nov 06 '23 01:11 joaopauloschuler

Regarding the training accuracy, the problem is to define an "accurate prediction" for each sample. You can have a look at this example:

https://github.com/joaopauloschuler/neural-api/tree/master/examples/HypotenuseFitLoading

In the above example, you can find:

  // Returns TRUE if difference is smaller than 0.1 .
  function LocalFloatCompare(A, B: TNNetVolume; ThreadId: integer): boolean;
  begin
    Result := ( Abs(A.FData[0]-B.FData[0])<0.1 );
  end; 
...
NFit.InferHitFn := @LocalFloatCompare;

In this example, an accurate prediction is a prediction with up to 0.1 in absolute error.

Nov 06 '23 01:11 joaopauloschuler

Regarding the error, this is how it's calculated:

    CurrentError := vOutput.SumDiff( pOutput );

There is no upper limit to it.

Nov 06 '23 01:11 joaopauloschuler

Thank you very much for the explanations.

for speed and memory comparison, I try to adapt this Autoencoder to PyTorch.

// Encoder
   NNAutoencoder.AddLayer([
  TNNetInput.Create(XSize, YSize, ZSize),
  TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
  TNNetMaxPool.Create(2, 2, 0),
  TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
  TNNetMaxPool.Create(2, 2, 0),

// Decoder
  TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
  TNNetUpsample.Create(),
  TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
  TNNetUpsample.Create(),
  TNNetConvolutionReLU.Create(32, 3, 1, 1, 1),
  TNNetConvolutionLinear.Create(ZSize, 3, 1, 1, 1), 
      TNNetReLUL.Create(-40, +40, 0) // Protection against overflow
  ]);

But not being a specialist in Python and Pytorch, I can't find the corresponding layers in Pytorch

I have this:

Encoder

 nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
 nn.ReLU
 nn.MaxPool2d(kernel_size=2, stride=2) 
 
 nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
 nn.ReLU
 nn.MaxPool2d(kernel_size=2, stride=2)

Decoder

 nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
 nn.ReLU
 nn.Upsample(scale_factor=2)
 
 nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
 nn.ReLU
 nn.Upsample(scale_factor=2)
 
 nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
 nn.ReLU

?????

Any help?

Sorry to bother you!!!

B->

Nov 06 '23 18:11 Dzandaa

I'll try to find time to do it.

Nov 12 '23 17:11 joaopauloschuler

Hi, My wife and I tested an autoencoder in CAI and Pytorch. On the same computer with GEForce GTX Titan XP 12 GB, Intel 10 cores and 128GB DDR4. Pytorch is 2 times faster.

We assume that this is due to the CUDA drivers.

On the other hand: without dataset, Pytorch + libraries: 5.2GB Difficult to implement on an embedded system Lazarus Release: 7MB B->

Nov 12 '23 18:11 Dzandaa

Many thanks for sharing!

At the time that you are benchmarking the speed, is this the training speed or the actual running?

In the case that you can share your PyTorch code, I would love to be able to repeat the benchmark at my end.

Nov 12 '23 19:11 joaopauloschuler

Hi, Yes it's the training speed. No problem to sharing our codes, but not on Github!!! You can contact us by sending a message on the Pascal Lazarus forum. B->

Nov 13 '23 10:11 Dzandaa

Question about AutoEncoders

Encoder

Decoder