khiops icon indicating copy to clipboard operation
khiops copied to clipboard

Improve concatenation with drivers

Open bruno-at-orange opened this issue 1 year ago • 2 comments

Description

In most of the parallel tasks, slaves produce files (called chunk) that must be concatenated to produce a final file. The concatenation process basically consists of:

  • Each chunk file is sent to the master via MPI
  • The master writes/appends the received chunk to the result file

Questions/Ideas

  • Use drivers for concatenation:
    • Each slave writes directly to the cloud
    • We call a new method form the drivers dedicated to concatenation. It should be more efficient than the write/append that is not very cloud native.
  • Another solution would be to write chunks directly to the cloud as part of a multipart file

bruno-at-orange avatar Oct 18 '24 07:10 bruno-at-orange

@bruno-at-orange to define the API Tristan to implement in the Azure Driver + to bench the impact // @bruno-at-orange to implement it in khiops-core

lucaurelien avatar Aug 07 '25 09:08 lucaurelien

regarder : #778

lucaurelien avatar Sep 17 '25 13:09 lucaurelien