neural-api
neural-api copied to clipboard
OpenCL fails with 2 fully connected layers
I'm having problem to adapt one of my program using CAI with OpenCL
I've tested "SimpleImageClassifierGPU" and it's working on my computer (removing the option -dAVX, because my CPU is old)
When I try to add OpenCL in my test program with 2 fully connected layers (no convolution, it fails).
When calling:
NeuralFit.Fit(NeuralNet, TrainingPairs, nil, nil, {batchsize=}4, {epochs=}SEEpoch.Value);
Without neural.cl in same directory as the executable: From line 584 of neuralopencl.pas 'File neural.cl could not be found.'
With neural.cl in same directory as the executable: From line 948 of neuralopencl.pas 'clCreateContext OK!' then crash...
I think that perhaps one of the problem is that my directory 'neural' is not in '../../../neural' but in ../neural
I just try same program on a Linux Mint 20.2
I add "-dUseCThreads" in Custom Options and change
{$IFDEF UseCThreads} cthreads, cmem, {$ENDIF}
to
{$IFDEF UseCThreads} cthreads, {$ENDIF}
It works, but I don't see any acceleration. for 1000 epoch: With OpenCl enabled and AVX : 34.62 Seconds Without OpenCL and With AVX: 33.82 Seconds Without OpenCL and Without AVX: 62.76 Seconds With OpenCL and Without AVX: 62.92 Seconds
OpenCL is actually slower in this experiment. I'm wondering if the number of weights/neurons is so small in this experiment that OpenCL has no advantage.
I don't know why it crashes on Windows after clCreateContext OK!
I'm about to start working on this.
On dense (fully connected layers), OpenCL is called only when there is enough neurons/weights to compensate the overhead that it adds:
FShouldOpenCL := (FNeurons.Count >= 512) and (pPrevLayer.Output.Size >= 128);
Depending on how many neurons you have on each layer, maybe its not even in use.
I've just tested the following and it works for me:
program Hypotenuse;
(*
Hypotenuse: learns how to calculate hypotenuse sqrt(X^2 + Y^2).
Copyright (C) 2019 Joao Paulo Schwarz Schuler
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*)
{$mode objfpc}{$H+}
uses {$IFDEF UNIX} {$IFDEF UseCThreads}
cthreads, {$ENDIF} {$ENDIF}
Classes,
neuralnetwork,
neuralvolume,
neuralfit,
neuralopencl;
function CreateHypotenusePairList(MaxCnt: integer): TNNetVolumePairList;
var
Cnt: integer;
LocalX, LocalY, Hypotenuse: TNeuralFloat;
begin
Result := TNNetVolumePairList.Create();
for Cnt := 1 to MaxCnt do
begin
LocalX := Random(100);
LocalY := Random(100);
Hypotenuse := sqrt(LocalX*LocalX + LocalY*LocalY);
Result.Add(
TNNetVolumePair.Create(
TNNetVolume.Create([LocalX, LocalY]),
TNNetVolume.Create([Hypotenuse])
)
);
end;
end;
// Returns TRUE if difference is smaller than 0.1 .
function LocalFloatCompare(A, B: TNNetVolume; ThreadId: integer): boolean;
begin
Result := ( Abs(A.FData[0]-B.FData[0])<0.1 );
end;
procedure RunAlgo();
var
NN: TNNet;
NFit: TNeuralFit;
TrainingPairs, ValidationPairs, TestPairs: TNNetVolumePairList;
Cnt: integer;
pOutPut: TNNetVolume;
EasyOpenCL: TEasyOpenCL;
begin
NN := TNNet.Create();
NFit := TNeuralFit.Create();
TrainingPairs := CreateHypotenusePairList(10000);
ValidationPairs := CreateHypotenusePairList(1000);
TestPairs := CreateHypotenusePairList(1000);
NN.AddLayer([
TNNetInput.Create(2),
TNNetFullConnectReLU.Create(512),
TNNetFullConnectReLU.Create(512),
TNNetFullConnectLinear.Create(1)
]);
EasyOpenCL := TEasyOpenCL.Create();
if EasyOpenCL.GetPlatformCount() = 0 then
begin
WriteLn('No OpenCL capable platform has been found.');
exit;
end;
WriteLn('Setting platform to: ', EasyOpenCL.PlatformNames[0]);
EasyOpenCL.SetCurrentPlatform(EasyOpenCL.PlatformIds[0]);
if EasyOpenCL.GetDeviceCount() = 0 then
begin
WriteLn('No OpenCL capable device has been found for platform ',EasyOpenCL.PlatformNames[0]);
exit;
end;
EasyOpenCL.SetCurrentDevice(EasyOpenCL.Devices[0]);
NFit.EnableOpenCL(EasyOpenCL.PlatformIds[0], EasyOpenCL.Devices[0]);
WriteLn('Computing...');
NFit.InitialLearningRate := 0.00001;
NFit.LearningRateDecay := 0;
NFit.L2Decay := 0;
NFit.InferHitFn := @LocalFloatCompare;
NFit.MaxThreadNum := 1;
NFit.Fit(NN, TrainingPairs, ValidationPairs, TestPairs, {batchsize=}32, {epochs=}50);
NN.DebugWeights();
pOutPut := TNNetVolume.Create({pSizeX=}1, {pSizeY=}1, {pDepth=}1, {FillValue=}1);
// tests the learning
for Cnt := 0 to 9 do
begin
NN.Compute(TestPairs[Cnt].I);
NN.GetOutput(pOutPut);
WriteLn
( 'Inputs:',
TestPairs[Cnt].I.FData[0]:5:2,', ',
TestPairs[Cnt].I.FData[1]:5:2,' - ',
'Output:',
pOutPut.Raw[0]:5:2,' ',
' Desired Output:',
TestPairs[Cnt].O.FData[0]:5:2
);
end;
EasyOpenCL.Free;
pOutPut.Free;
TestPairs.Free;
ValidationPairs.Free;
TrainingPairs.Free;
NFit.Free;
NN.Free;
Write('Press ENTER to exit.');
ReadLn;
end;
var
// Stops Lazarus errors
Application: record Title:string; end;
begin
Application.Title:='Hypotenuse Example';
RunAlgo();
end.
I've just tested the following and it also works for me:
//NFit.MaxThreadNum := 1;
NFit.Fit(NN, TrainingPairs, nil, nil, {batchsize=}32, {epochs=}50);
and
NN.AddLayer([
TNNetInput.Create(2),
TNNetFullConnectReLU.Create(512),
TNNetFullConnectReLU.Create(512),
TNNetFullConnectReLU.Create(512),
TNNetFullConnectLinear.Create(1)
]);
Given that I can't reproduce, you'll need to share a full Lazarus project source code that provokes the error.
In the case that it helps, this is how neural.cl is loaded:
constructor TNeuralKernel.Create(pCurrentPlatform: cl_platform_id;
pCurrentDevice: cl_device_id; kernelname: string = 'cai_dot_product');
begin
inherited Create();
SetCurrentPlatform(pCurrentPlatform);
SetCurrentDevice(pCurrentDevice);
// Create the OpenCL Kernel Here:
if FileExists('../../../neural/neural.cl') then
begin
CompileProgramFromFile('../../../neural/neural.cl');
end
else if FileExists('neural.cl') then
begin
CompileProgramFromFile('neural.cl');
end
else
begin
MessageProc('File neural.cl could not be found.');
end;
PrepareKernel(kernelname);
end;
Hi,
Thank you very much for your tests :) Here is my little test program.
You have to change the path of /neural and add neural.cl in the same directory as the executable.
Just train (500-100 epoch) and test
B->