Aaron Shi

Results 9 issues of Aaron Shi

Since, we call get_module in each of the unit tests: train, eval, example, and check_device, the unit tests are skipped when get_module() is Not Implemented. This is because we catch...

cla signed

After PR #526 lands, we need to fix these: ``` # FIXME: Models will use context "with torch.no_grad():", so the lifetime of no_grad will end after the eval(). # FIXME:...

Revert "remove deprecated HSAQueue::copy_ext due to inf recursion" This reverts commit 5cd096ae42277d36d123c827fd8981de288f07a8 and 4ee361270f5938e34b0759c21f59cdcd0a99b021. This change causes a regression on GFX908 for BFloat16 tuning on MIOpen.

Replace package dependency on opencl with rocm-dev which includes either hcc or hip-clang.

Summary: Although hipGetDeviceProperties shows 8 devices enumerating from 0 to 7, when using roctracer_record_t, the record->device_id enumerates from 2 to 9. Manually enumerate from 0-7 by subtracting 2, and opened...

fb-exported
cla signed

To follow-up on the discussion in https://github.com/pytorch/kineto/pull/868, we can continue the discussion for which clock to use for timestamp collection. @mwootton pointed out that we should always be using a...

enhancement

### Problem Description Hi, We are using Roctracer for capturing GPU events via roctracer_record_t and `hcc_cb_properties.buffer_callback_fun = activity_callback;`. However, we've found that events have device_id starting from 2 to 9....

Under Investigation

Summary: Although hipGetDeviceProperties shows 8 devices enumerating from 0 to 7, when using roctracer_record_t, the record->device_id enumerates from 2 to 9. This is because roctracer considers id 0 and 1...

fb-exported
cla signed

Summary: Similar to CUDA, save the device properties to the metadata of Roctracer GPU traces. - Renamed CudaDeviceProperties files to DeviceProperties, and changed TARGETS' cpp_library to device_properties - Added gpu...

fb-exported
cla signed