aimet icon indicating copy to clipboard operation
aimet copied to clipboard

problems about sim.compute_encoding

Open huxian0402 opened this issue 3 years ago • 5 comments

Hi,

when I add self.sim.compute_encodings(forward_pass_callback=self.evaluate_model, forward_pass_callback_args=5) in my Qat training code, the training loss will be Nan, It's very strange. Could you please help me? @mohanksriram @Rohan-Chaudhury @aimetci @quic-bharathr @quic-mangal

Thanks a lot!

huxian0402 avatar Sep 01 '21 11:09 huxian0402

Hi @huxian0402 Thank you for the query. Could you please share the full code snippet used in this case.

quic-ssiddego avatar Sep 02 '21 00:09 quic-ssiddego

Hi,

Here is full code snippet.

def evaluate_model(model, int): print("evaluate_model start...") model.eval() num_test = 0 nmes_merge = 0.0

test_data = data.facepp_st_data('test')
for batch in test_data:
    img = batch['image']
    lms_gt = batch['landmarks']
    inter_dis = batch['inter_dis']
    scales = batch['scales']
    mu_x = torch.arange(0.5, 7 * 1.0, 1)
    mu_y = torch.arange(0.5, 7 * 1.0, 1)
    mu_y, mu_x = torch.meshgrid(mu_y, mu_x)
    batch_size = img.size(0)
    num_test += batch_size

    if CUDA:
        img = img.cuda()
        lms_gt = lms_gt.cuda()
        inter_dis = inter_dis.cuda()
        scales = scales.cuda()
        mu_x = mu_x.cuda()
        mu_y = mu_y.cuda()

    with torch.no_grad():
        lms_pred, pose_pred, vis_pred = forward_pip_dis_v1(model, img, mu_x, mu_y)
        lms_pred = lms_pred.flatten()

    # 计算关键点误差
    lms_gt = lms_gt / 16
    lms_pred = lms_pred / 16

    batch_dif = calc_landmark_dis(lms_gt, lms_pred, scales, inter_dis)
    nmes_merge += batch_dif.cpu().numpy()

nmes_merge /= num_test
print("evaluate_model finished...", nmes_merge)
return nmes_merge

class Trainer(object): def init(self, weights=None): initLogging() # resnet_18 = resnet18(True) # self.model = Pip_resnet18(resnet_18, 106, 16) self.model = Pip_mobilenet(cfg.NUM_POINT, cfg.Net_stride, width_mult=0.25) self.model = self.model.cuda()

    if weights is not None:
        weight = torch.load(weights)
        self.model.load_state_dict(weight)
    
    print(self.model)

    self.sim = QuantizationSimModel(self.model, default_output_bw=8, default_param_bw=8, dummy_input=torch.rand(1, 1, 112, 112).cuda(), in_place=False,
                               config_file='/usr/local/lib/python3.6/dist-packages/aimet_common/quantsim_config/default_config.json')
    self.sim.model = self.sim.model.to(torch.device('cuda'))
    print(next(self.sim.model.parameters()).device)
    self.sim.compute_encodings(forward_pass_callback=evaluate_model,forward_pass_callback_args=5)
     ........ 

@quic-ssiddego

and could you give a contact information such as wechat or Communication Group for more convenient communication? Thanks a lot!

huxian0402 avatar Sep 02 '21 02:09 huxian0402

Hi @huxian0402 One way to isolate the issue would be to dump all the encodings generated after the sim.compute_encodings step and check if all the encodings are valid. (param and activations). Please refer an example here to export encodings for given model : https://github.com/quic/aimet/blob/develop/TrainingExtensions/torch/test/python/test_quantizer.py#L689 . Further, you could enable/ disable subset of param or activation quantizers to isolate the issue. Please refer this code to control quantizer config for given model : https://github.com/quic/aimet/blob/develop/TrainingExtensions/torch/test/python/test_quantizer.py#L646 https://github.com/quic/aimet/blob/develop/TrainingExtensions/torch/test/python/test_quantizer.py#L1526. Please share your observations once you have tried this. Thank you.

quic-ssiddego avatar Sep 02 '21 22:09 quic-ssiddego

Thanks for your reply,

I dump all the encodings generated after the sim.compute_encodings step and find that all the encodings are valid.

And I find another strange problem. I tried to run the training code many times , the result of evaluate_model called by sim.compute_encodings(forward_pass_callback=evaluate_model,forward_pass_callback_args=5) always different when inputting same image every time i tried.

Could you please help me? @quic-ssiddego @mohanksriram @Rohan-Chaudhury @aimetci

huxian0402 avatar Sep 03 '21 07:09 huxian0402

Hi @huxian0402 Sorry about the delayed response. Could you please share the associated code snippet?

quic-ssiddego avatar Oct 06 '21 22:10 quic-ssiddego

Closing this issue due to inactivity. Please re-open it/ create a new issue if you need further help.

quic-mangal avatar Apr 04 '23 16:04 quic-mangal