FTPipe icon indicating copy to clipboard operation
FTPipe copied to clipboard

CPU support?

Open pablogranolabar opened this issue 3 years ago • 1 comments

Hi, very neat project.

Question: is it possible to use FTPipe with massively parallel CPU clusters? Say for example 256 VMs?

pablogranolabar avatar Jan 07 '22 05:01 pablogranolabar

Hi @pablogranolabar, tweaks will be needed, but it can be made possible.

Should consider the following parts:

  • Distributed execution should work out of the box (I did a small PoC of a distirbuted execution with 2 machines via openMPI)
  • All partitioned configurations can be returned on CPU using DEBUG option, e.g.,: https://github.com/saareliad/FTPipe/blob/c3d853080e0bebde50deef78892baf0f3663daf1/models/partitioned/t5_3b_tied_lmheads_320_8_8p_bw12_async_squad1_mpipe.py#L45
  • The pipeline runtime can work with CPU:. add a line with "cpu": true to the json config https://github.com/saareliad/FTPipe/blob/c3d853080e0bebde50deef78892baf0f3663daf1/pipe/prepare_pipeline.py#L302 I kept a file with all options here, e.g., https://github.com/saareliad/FTPipe/blob/c3d853080e0bebde50deef78892baf0f3663daf1/pipe/configs/all_options.json#L67
  • partitioning Analysis can run on cpus. see here
  • profiling is currently written to be hardcoded just for GPUs, but it should be very easy to change. Would need to change several functions here so profiling would be done on CPU.

Finally, there are some partitioning heuristics which would need to be changed according to your system, e.g., memory threshold in the master branch is hardcoded to 11GB for RTX2080ti: https://github.com/saareliad/FTPipe/blob/c3d853080e0bebde50deef78892baf0f3663daf1/autopipe/autopipe/model_partitioning/heuristics.py#L327

saareliad avatar Mar 07 '22 12:03 saareliad