timemory icon indicating copy to clipboard operation
timemory copied to clipboard

Plotting issues with hierarchical roofline

Open jrmadsen opened this issue 5 years ago • 6 comments

Using output ert_results.json, only one memory level is plotted:

python -m timemory.roofline -d -ai ert_results.json -op ert_results.json

jrmadsen avatar Oct 20 '20 00:10 jrmadsen

ert_results.json can be generated from ex_ert with -DTIMEMORY_BUILD_EXAMPLES=ON

jrmadsen avatar Oct 20 '20 00:10 jrmadsen

Requires changes in #89, specifically:

https://github.com/NERSC/timemory/blob/88d7b915bb60c1bcf7dd0a1ab89a198ce5343a63/timemory/roofline/roofline.py#L180

        if self.units is not None:
            for i in range(len(self.data)):
                self.data[i] /= self.units

jrmadsen avatar Oct 20 '20 00:10 jrmadsen

OK, actually that's the issue I was referring to: the L1 and L2 data may be there in the .json file, but the numbers are wrong cause ERT cannot properly measure the L1 bandwidth on modern architectures such as Skylake or V100 or even P100. I will definitely look into this issue since it's been there for a long time already. Will keep you updated here.

PointKernel avatar Oct 20 '20 01:10 PointKernel

Would it help to have the L1, L2, and (if exists) L3 data cache sizes in the JSON so you can extract the ERT tests around those values?

jrmadsen avatar Dec 06 '20 11:12 jrmadsen

That will definitely help the L2 and L3 detection, the major issue is that ERT can never reach the L1 bandwidth on either Skylake or P100/V100. I need to try some new kernels (micro-benchmarks) on ERT.

PointKernel avatar Dec 07 '20 19:12 PointKernel