neural-api icon indicating copy to clipboard operation
neural-api copied to clipboard

OpenCL fails with 2 fully connected layers

Open joaopauloschuler opened this issue 2 years ago • 10 comments

I'm having problem to adapt one of my program using CAI with OpenCL

I've tested "SimpleImageClassifierGPU" and it's working on my computer (removing the option -dAVX, because my CPU is old)

When I try to add OpenCL in my test program with 2 fully connected layers (no convolution, it fails).

joaopauloschuler avatar Dec 11 '22 02:12 joaopauloschuler

When calling:

NeuralFit.Fit(NeuralNet, TrainingPairs, nil, nil, {batchsize=}4, {epochs=}SEEpoch.Value);

Without neural.cl in same directory as the executable: From line 584 of neuralopencl.pas 'File neural.cl could not be found.'

With neural.cl in same directory as the executable: From line 948 of neuralopencl.pas 'clCreateContext OK!' then crash...

I think that perhaps one of the problem is that my directory 'neural' is not in '../../../neural' but in ../neural

Dzandaa avatar Dec 11 '22 10:12 Dzandaa

I just try same program on a Linux Mint 20.2

I add "-dUseCThreads" in Custom Options and change

{$IFDEF UseCThreads} cthreads, cmem, {$ENDIF}

to

{$IFDEF UseCThreads} cthreads, {$ENDIF}

It works, but I don't see any acceleration. for 1000 epoch: With OpenCl enabled and AVX : 34.62 Seconds Without OpenCL and With AVX: 33.82 Seconds Without OpenCL and Without AVX: 62.76 Seconds With OpenCL and Without AVX: 62.92 Seconds

Dzandaa avatar Dec 11 '22 11:12 Dzandaa

OpenCL is actually slower in this experiment. I'm wondering if the number of weights/neurons is so small in this experiment that OpenCL has no advantage.

joaopauloschuler avatar Dec 12 '22 05:12 joaopauloschuler

I don't know why it crashes on Windows after clCreateContext OK!

Dzandaa avatar Dec 12 '22 12:12 Dzandaa

I'm about to start working on this.

joaopauloschuler avatar Dec 13 '22 21:12 joaopauloschuler

On dense (fully connected layers), OpenCL is called only when there is enough neurons/weights to compensate the overhead that it adds:

FShouldOpenCL := (FNeurons.Count >= 512) and (pPrevLayer.Output.Size >= 128);

Depending on how many neurons you have on each layer, maybe its not even in use.

joaopauloschuler avatar Dec 13 '22 21:12 joaopauloschuler

I've just tested the following and it works for me:

program Hypotenuse;
(*
Hypotenuse: learns how to calculate hypotenuse sqrt(X^2 + Y^2).
Copyright (C) 2019 Joao Paulo Schwarz Schuler

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*)

{$mode objfpc}{$H+}

uses {$IFDEF UNIX} {$IFDEF UseCThreads}
  cthreads, {$ENDIF} {$ENDIF}
  Classes,
  neuralnetwork,
  neuralvolume,
  neuralfit,
  neuralopencl;

  function CreateHypotenusePairList(MaxCnt: integer): TNNetVolumePairList;
  var
    Cnt: integer;
    LocalX, LocalY, Hypotenuse: TNeuralFloat;
  begin
    Result := TNNetVolumePairList.Create();
    for Cnt := 1 to MaxCnt do
    begin
      LocalX := Random(100);
      LocalY := Random(100);
      Hypotenuse := sqrt(LocalX*LocalX + LocalY*LocalY);

      Result.Add(
        TNNetVolumePair.Create(
          TNNetVolume.Create([LocalX, LocalY]),
          TNNetVolume.Create([Hypotenuse])
        )
      );
    end;
  end;

  // Returns TRUE if difference is smaller than 0.1 .
  function LocalFloatCompare(A, B: TNNetVolume; ThreadId: integer): boolean;
  begin
    Result := ( Abs(A.FData[0]-B.FData[0])<0.1 );
  end;

  procedure RunAlgo();
  var
    NN: TNNet;
    NFit: TNeuralFit;
    TrainingPairs, ValidationPairs, TestPairs: TNNetVolumePairList;
    Cnt: integer;
    pOutPut: TNNetVolume;
    EasyOpenCL: TEasyOpenCL;
  begin
    NN := TNNet.Create();
    NFit := TNeuralFit.Create();
    TrainingPairs := CreateHypotenusePairList(10000);
    ValidationPairs := CreateHypotenusePairList(1000);
    TestPairs := CreateHypotenusePairList(1000);

    NN.AddLayer([
      TNNetInput.Create(2),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectLinear.Create(1)
    ]);

    EasyOpenCL := TEasyOpenCL.Create();
    if EasyOpenCL.GetPlatformCount() = 0 then
    begin
      WriteLn('No OpenCL capable platform has been found.');
      exit;
    end;
    WriteLn('Setting platform to: ', EasyOpenCL.PlatformNames[0]);
    EasyOpenCL.SetCurrentPlatform(EasyOpenCL.PlatformIds[0]);
    if EasyOpenCL.GetDeviceCount() = 0 then
    begin
      WriteLn('No OpenCL capable device has been found for platform ',EasyOpenCL.PlatformNames[0]);
      exit;
    end;
    EasyOpenCL.SetCurrentDevice(EasyOpenCL.Devices[0]);

    NFit.EnableOpenCL(EasyOpenCL.PlatformIds[0], EasyOpenCL.Devices[0]);

    WriteLn('Computing...');
    NFit.InitialLearningRate := 0.00001;
    NFit.LearningRateDecay := 0;
    NFit.L2Decay := 0;
    NFit.InferHitFn := @LocalFloatCompare;
    NFit.MaxThreadNum := 1;
    NFit.Fit(NN, TrainingPairs, ValidationPairs, TestPairs, {batchsize=}32, {epochs=}50);
    NN.DebugWeights();

    pOutPut := TNNetVolume.Create({pSizeX=}1, {pSizeY=}1, {pDepth=}1, {FillValue=}1);

    // tests the learning
    for Cnt := 0 to 9 do
    begin
      NN.Compute(TestPairs[Cnt].I);
      NN.GetOutput(pOutPut);
      WriteLn
      ( 'Inputs:',
        TestPairs[Cnt].I.FData[0]:5:2,', ',
        TestPairs[Cnt].I.FData[1]:5:2,' - ',
        'Output:',
        pOutPut.Raw[0]:5:2,' ',
        ' Desired Output:',
        TestPairs[Cnt].O.FData[0]:5:2
      );
    end;

    EasyOpenCL.Free;
    pOutPut.Free;
    TestPairs.Free;
    ValidationPairs.Free;
    TrainingPairs.Free;
    NFit.Free;
    NN.Free;
    Write('Press ENTER to exit.');
    ReadLn;
  end;

var
  // Stops Lazarus errors
  Application: record Title:string; end;

begin
  Application.Title:='Hypotenuse Example';
  RunAlgo();
end.

joaopauloschuler avatar Dec 13 '22 21:12 joaopauloschuler

I've just tested the following and it also works for me:

    //NFit.MaxThreadNum := 1;
    NFit.Fit(NN, TrainingPairs, nil, nil, {batchsize=}32, {epochs=}50);

and

    NN.AddLayer([
      TNNetInput.Create(2),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectLinear.Create(1)
    ]);

Given that I can't reproduce, you'll need to share a full Lazarus project source code that provokes the error.

joaopauloschuler avatar Dec 13 '22 21:12 joaopauloschuler

In the case that it helps, this is how neural.cl is loaded:

constructor TNeuralKernel.Create(pCurrentPlatform: cl_platform_id;
  pCurrentDevice: cl_device_id; kernelname: string = 'cai_dot_product');
begin
  inherited Create();
  SetCurrentPlatform(pCurrentPlatform);
  SetCurrentDevice(pCurrentDevice);

  // Create the OpenCL Kernel Here:
  if FileExists('../../../neural/neural.cl') then
  begin
    CompileProgramFromFile('../../../neural/neural.cl');
  end
  else if FileExists('neural.cl') then
  begin
    CompileProgramFromFile('neural.cl');
  end
  else
  begin
    MessageProc('File neural.cl could not be found.');
  end;
  PrepareKernel(kernelname);
end; 

joaopauloschuler avatar Dec 13 '22 22:12 joaopauloschuler

Hi,

Thank you very much for your tests :) Here is my little test program.

You have to change the path of /neural and add neural.cl in the same directory as the executable.

NetSpectrum.zip

Just train (500-100 epoch) and test

B->

Dzandaa avatar Dec 14 '22 17:12 Dzandaa