Mustafa Eyceoz

Results 15 issues of Mustafa Eyceoz

Look into what communication library backends (NCCL, GLOO, MPI, etc.) are currently supported via the SDK and submission to Ray (and direct MCAD potentially), and what we may need to...

Look into how the SDK currently supports TF use-cases (interactively w/ Ray cluster), and what we could potentially do to support TensorFlow direct job submission.

Add the ability to create a local Ray cluster (rather than in OpenShift via AW->MCAD->KubeRay). This will allow users to utilize the exact same SDK workflow/code locally as they would...

Based on identified upstream community, plan the refactoring of architecture for generalization and community collaboration.

Who to connect/reach out to, who to work with, which community to target, etc.