FumoTime

Results 6 comments of FumoTime

> Also, what version/commit of migraphx are you using for this? As we did fix a reshape_lazy issue in #2721, so I dont know if that fixes your issue or...

> I'll try out ROCm 6.1 with MIGraphX 2.9 and report back Below is the performance of ROCm 6.1 with MIGraphX 2.9: Resolution: 512x512, steps: 75, excluding the first pass...

As for SDXL, 1024x1024 compilation fails with the following error: `RuntimeError: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/AMDMIGraphX/src/targets/gpu/hip.cpp:109: allocate_gpu: Memory not available to allocate buffer: 41943040`. Reducing resolution to 768x768 reduced the memory allocation but the...

> You can run the pytorch sdxl on its own on your system right? Yes, it does. > In general, we try and avoid duplicating weights when compiling but sometimes...

@hypertseng Most likely, `cudaDeviceSynchronize` time includes the kernel execution time. You can Use cuda events to time it instead. ``` torch.cuda.reset_peak_memory_stats() start_event = torch.cuda.Event(enable_timing=True) end_event = torch.cuda.Event(enable_timing=True) start_event.record() minimal_result =...

@yukieiji, does `hipblaslt-bench` or `hipblaslt-test` work on your end? My build succeeded on gfx1030, but running either of those two would fail because `TensileLibrary_lazy_gfx1030.dat` was not generated.