Ma, Guokai
Ma, Guokai
Intel Extension for PyTorch had been updated to lastest version and AVX2 detection issue had been resolved. > We will have a new release for Intel Extension for PyTorch for...
--bind_cores_to_rank had been changed to 'store_true' so behavior is same as other boolean parameters.
When calling `numactl`, if all cores specified by `-C` belongs to same NUMA domain X, would add `-m X` to numactl to bind memory allocation as well. Observe slight perfomance...
The behavior is the same on machine with sub numa clustering (SNC), so the `-m X` would work for both multi-socket or machine with SNC. > When calling `numactl`, if...
Hi, the new accelerator abstraction interface had been integrated and we also added support for OpBuilder. The new interface definition and its integration code is ready for review now. We...
@tjruwase @jeffra this PR seems keeping conflict with master branch. How about merge with smaller PRs? Step 1. PR that merge the interface definition and implementation part. This ensures the...
@tjruwase #2504 had been created as first step of this pull request.
@tjruwase https://github.com/microsoft/DeepSpeed/pull/2560 is created as step 2 of this PR and ready for review.
@tjruwase Hi, https://github.com/microsoft/DeepSpeed/pull/2677 had been created as step 3 of this PR and is ready for review.
> @delock, is this PR still actively developed for merging? @tjruwase thanks for asking. We are pulling this PR into our internal repo for testing and see if there are...