pyJoules icon indicating copy to clipboard operation
pyJoules copied to clipboard

Frequent zero consumption for nvidia device

Open nikhil153 opened this issue 3 years ago • 0 comments

I am monitoring energy consumption of a pytorch model. I am sampling several times during training loop with EnergyContext and record (code snippet below). I am noticing that there are more than half samples showing zero consumption. See attached partial log.
joules_sample.log

Any ideas?

for i, (images, labels) in enumerate(train_loader):
    # get the inputs; data is a list of [inputs, labels]
    images = images.to(device)
    labels = labels.to(device)
  
    # zero the parameter gradients
    optimizer.zero_grad()
  
    # Monitor joules sparingly
    if (i % monitor_interval) == (monitor_interval-1):
        if monitor_joules:
            # pyjoules
            with EnergyContext(handler=pd_handler, start_tag='forward') as ctx:
                # forward + backward + optimize
                outputs = model(images)
                ctx.record(tag='loss')
                loss = criterion(outputs, labels)
                ctx.record(tag='backward')  
                loss.backward()
                ctx.record(tag='step')
                optimizer.step()
                ctx.record(tag='overhead')

nikhil153 avatar Mar 08 '21 19:03 nikhil153