aimet
aimet copied to clipboard
problems about sim.compute_encoding
Hi,
when I add self.sim.compute_encodings(forward_pass_callback=self.evaluate_model, forward_pass_callback_args=5) in my Qat training code, the training loss will be Nan, It's very strange. Could you please help me? @mohanksriram @Rohan-Chaudhury @aimetci @quic-bharathr @quic-mangal
Thanks a lot!
Hi @huxian0402 Thank you for the query. Could you please share the full code snippet used in this case.
Hi,
Here is full code snippet.
def evaluate_model(model, int): print("evaluate_model start...") model.eval() num_test = 0 nmes_merge = 0.0
test_data = data.facepp_st_data('test')
for batch in test_data:
img = batch['image']
lms_gt = batch['landmarks']
inter_dis = batch['inter_dis']
scales = batch['scales']
mu_x = torch.arange(0.5, 7 * 1.0, 1)
mu_y = torch.arange(0.5, 7 * 1.0, 1)
mu_y, mu_x = torch.meshgrid(mu_y, mu_x)
batch_size = img.size(0)
num_test += batch_size
if CUDA:
img = img.cuda()
lms_gt = lms_gt.cuda()
inter_dis = inter_dis.cuda()
scales = scales.cuda()
mu_x = mu_x.cuda()
mu_y = mu_y.cuda()
with torch.no_grad():
lms_pred, pose_pred, vis_pred = forward_pip_dis_v1(model, img, mu_x, mu_y)
lms_pred = lms_pred.flatten()
# 计算关键点误差
lms_gt = lms_gt / 16
lms_pred = lms_pred / 16
batch_dif = calc_landmark_dis(lms_gt, lms_pred, scales, inter_dis)
nmes_merge += batch_dif.cpu().numpy()
nmes_merge /= num_test
print("evaluate_model finished...", nmes_merge)
return nmes_merge
class Trainer(object): def init(self, weights=None): initLogging() # resnet_18 = resnet18(True) # self.model = Pip_resnet18(resnet_18, 106, 16) self.model = Pip_mobilenet(cfg.NUM_POINT, cfg.Net_stride, width_mult=0.25) self.model = self.model.cuda()
if weights is not None:
weight = torch.load(weights)
self.model.load_state_dict(weight)
print(self.model)
self.sim = QuantizationSimModel(self.model, default_output_bw=8, default_param_bw=8, dummy_input=torch.rand(1, 1, 112, 112).cuda(), in_place=False,
config_file='/usr/local/lib/python3.6/dist-packages/aimet_common/quantsim_config/default_config.json')
self.sim.model = self.sim.model.to(torch.device('cuda'))
print(next(self.sim.model.parameters()).device)
self.sim.compute_encodings(forward_pass_callback=evaluate_model,forward_pass_callback_args=5)
........
@quic-ssiddego
and could you give a contact information such as wechat or Communication Group for more convenient communication? Thanks a lot!
Hi @huxian0402 One way to isolate the issue would be to dump all the encodings generated after the sim.compute_encodings step and check if all the encodings are valid. (param and activations). Please refer an example here to export encodings for given model : https://github.com/quic/aimet/blob/develop/TrainingExtensions/torch/test/python/test_quantizer.py#L689 . Further, you could enable/ disable subset of param or activation quantizers to isolate the issue. Please refer this code to control quantizer config for given model : https://github.com/quic/aimet/blob/develop/TrainingExtensions/torch/test/python/test_quantizer.py#L646 https://github.com/quic/aimet/blob/develop/TrainingExtensions/torch/test/python/test_quantizer.py#L1526. Please share your observations once you have tried this. Thank you.
Thanks for your reply,
I dump all the encodings generated after the sim.compute_encodings step and find that all the encodings are valid.
And I find another strange problem. I tried to run the training code many times , the result of evaluate_model called by sim.compute_encodings(forward_pass_callback=evaluate_model,forward_pass_callback_args=5) always different when inputting same image every time i tried.
Could you please help me? @quic-ssiddego @mohanksriram @Rohan-Chaudhury @aimetci
Hi @huxian0402 Sorry about the delayed response. Could you please share the associated code snippet?
Closing this issue due to inactivity. Please re-open it/ create a new issue if you need further help.