DCGM icon indicating copy to clipboard operation
DCGM copied to clipboard

Old data are copied into new data in dcgmGroupSamples.GetAllSinceLastCall

Open optyang opened this issue 1 year ago • 0 comments

Hi,

I am using the python bindings of DCGM 3.2.6.

When I run my python script which includes GetAllSinceLastCall in DcgmGroupSamples (here), it seems the old data (stored in cache?) from previous python runs are also copied to the values.

This is the first time I ran the python script and print out the field values:

(xformers) yangyang22@workers-st-p4de-107:/mnt/fsx-home/yangyang22/projects/xformers/xformers/profiler$ python3 dcgm_example.py -p 1158114
Connecting to a standalone hostengine with auto opmode...
time instance|value: 0|0.47234506497546147
time instance|value: 1|0.47234506497546147
time instance|value: 2|0.4753820541153355
time instance|value: 3|0.47537324481855675
time instance|value: 4|0.4753905944019482
time instance|value: 5|0.4753868239508041
time instance|value: 6|0.4754042738288565

Now I ran the same python script again, the old field values from the previous python script are also there (time instances 1-6 above are now as same as time instances 0-5 below):

time instance|value: 0|0.47234506497546147
time instance|value: 1|0.4753820541153355
time instance|value: 2|0.47537324481855675
time instance|value: 3|0.4753905944019482
time instance|value: 4|0.4753868239508041
time instance|value: 5|0.4754042738288565
time instance|value: 6|0.4969533182061546
time instance|value: 7|0.4969533182061546
time instance|value: 8|0.4753617928144854
time instance|value: 9|0.4753946481315652
time instance|value: 10|0.4753674762672508
time instance|value: 11|0.4753788155863566
time instance|value: 12|0.4754120235068984

The code is as follows:

     dcgmFieldGroup = pydcgm.DcgmFieldGroup(dcgmHandle, name="Profiling", fieldIds=[1004])
     dcgmGroupSamples = pydcgm.DcgmGroupSamples(dcgmHandle, dcgmGroup.GetId(), dcgmGroup)
     dcgmGroupSamples.WatchFields(dcgmFieldGroup, 1000000, 3600, 0)

     # collect profiling results run in background
     profiling_results = dcgmGroupSamples.GetAllSinceLastCall(None, dcgmFieldGroup)
     
     # replace this by the code that should be profiled
     time.sleep(5)
     
     # collect profiling results
     dcgmGroupSamples.GetAllSinceLastCall(profiling_results, dcgmFieldGroup)
     
     # print profiling results
     for gpu_id in profiling_results.values.keys():
         for field_id in profiling_results.values[gpu_id].keys():
             for time_instance, gpu_field_time in enumerate(profiling_results.values[gpu_id][field_id]):
                 print(f"time instance|value: {time_instance}|{gpu_field_time.value}")

What is the proper way to get the data just from the current run? Thanks a lot.

optyang avatar Nov 02 '23 14:11 optyang