William Zhang comments

Results 34 comments of


                                            William Zhang

opal/accelerator: Initial accelerator framework implementation

Need to: Rework component selection logic Replace code that calls common_cuda that can call the framework device agnostically instead Do more testing

opal/accelerator: Initial accelerator framework implementation

Looks like the PR build checker failed due to jenkins losing connection, will fix itself on a re-run

opal/accelerator: Initial accelerator framework implementation

I changed the datatype engine, mtl's, common/ompio, osc/rdma, and mtls to use the new framework. The remaining dependencies on common_cuda.h are in: - coll_cuda component - nbc_internal - pml ob1...

opal/accelerator: Initial accelerator framework implementation

Added new asynchronous API's, Seth added a null component, rewrote the asynchronous progress engine in the ob1 pml, replaced all the OPAL_CUDA_SUPPORT ifdefs in the pml and btls (non cuda)....

opal/accelerator: Initial accelerator framework implementation

> The failures look real; the datatype unit tests are failing with a segmentation fault. Yeah, I think I messed something up in my pml ob1 code conversion, I'll fix...

opal/accelerator: Initial accelerator framework implementation

> Would it be a win to have a accelerator.rst explaining the design, i.e., streams and events. > > Sorry, I meant to say: would an introduction in accelerator.h help?...

opal/accelerator: Initial accelerator framework implementation

TODOs: 1. rm -rf opal/cuda -> move all usages of cuda in OMPI to Accelerator Framework - IN PROGRESS 2. ~~Figure out how to handle multiple Accelerators (of same type)...

opal/accelerator: Initial accelerator framework implementation

> * What are the configure parameters to build OMPI with accelerator component? --with-cuda= > * What are the runtime parameters needed to toggle between accelerator instances? Not sure if...

opal/accelerator: Initial accelerator framework implementation

Is someone familiar with the mellanox CI? The failure looks real but I'm not sure what exactly it is

OFI components: using FI_NAME_MAX is incorrect

Is this the sort of behavior you're trying to implement? ``` size_t len = 1; char tmp_addr[1]; fi_getname(fid, tmp_addr, &len); char *addr = malloc(len); fi_getname(fid, addr, &len); ```