framework icon indicating copy to clipboard operation
framework copied to clipboard

`VariableCellArrayReal::synchronize`: parallelize on item and second dimension when pack/unpack messages

Open DavidDureau opened this issue 10 months ago • 2 comments

Consider a variable of type VariableCellArrayReal where the second dimension is high (for example 38400).

When we want to synchronize this variable between GPUs, the messages are packed and unpacked on GPU using the Accelerator API: https://github.com/arcaneframework/framework/blob/74aa8336ad10bd5863c43aad0dc4c261d0be7fb9/arcane/src/arcane/accelerator/MemoryCopier.cc#L80.

In the case where the number of items (nb_index) is low and the second dimension (sub_size) is really high, the _copyFrom and _copyTo methods are expansive (because not enough parallelism).

Is it possible to parallelize both on nb_index and sub_size thanks to a RUNCOMMAND_LOOP2?

DavidDureau avatar Feb 12 '25 10:02 DavidDureau

Yes it should be possible. It is a good idea.

grospelliergilles avatar Feb 12 '25 17:02 grospelliergilles

I am studying the following solution:

change in _copyFrom : ` Int32 nb_index = indexes.size(); const Int64 sub_size = m_extent.v;

auto command = makeCommand(queue);
command << RUNCOMMAND_LOOP1(iter, nb_index)
{
  auto [i] = iter();
  Int64 zindex = i * sub_size;
  Int64 zci = indexes[i] * sub_size;
  for (Int32 z = 0; z < sub_size; ++z)
    destination[zindex + z] = source[zci + z];
};  `

to ` Int32 nb_index = indexes.size(); const Int64 sub_size = m_extent.v;

auto command = makeCommand(queue);

if(nb_index < sub_size){ auto c = MakeLoopRange(nb_index,sub_size); command << RUNCOMMAND_LOOP(iter, c) { auto [i,z] = iter(); Int64 zindex = i * sub_size; Int64 zci = indexes[i] * sub_size; destination[zindex + z] = source[zci + z];

}else{ command << RUNCOMMAND_LOOP1(iter, nb_index) { auto [i] = iter(); Int64 zindex = i * sub_size; Int64 zci = indexes[i] * sub_size; for (Int32 z = 0; z < sub_size; ++z) destination[zindex + z] = source[zci + z]; }; } `

same modification in _copyTo.

I'm still in the testing phase, but it might be a solution.

CassRB avatar Feb 13 '25 15:02 CassRB