TRTorch now can only compile and keep forward method of torchscript. It is not transparency to users. If users exports other methods in torchscript, and their program is based on...
I review the, find the arch of fan is different from officials, what makes you do this?
(aspect_ratios_.size()+1) is a typo ? I think right equation is : num_priors_ += aspect_ratios_.size() * (pow(densitys_[i],2)-1)
I add distillation when training resnet18. But the Top-1 Acc degrades from 68.150 % to 67.364%。 Hyperparameters as follow: 4gpu epochs: 90 learning_rate: 0.01 momentum: 0.9 weight_decay: 0.0001 mode: step...
只修改这两行代码,LLaMa-2-7B 模型无法得到正确的输出。 ` Generate(kernels_); group_sizes_.push_back(64);` FP16 的输出: @Input: The first time I saw the movie, I was like, 'Oh my God, _Output: this is so cool.' I was like, INT4...
In file `grouped_sum = tf.sqrt(tf.reduce_sum(tf.pow(W,2),axis=[0,1,2]))` I think it's filter wise group, not input channel wise group, but the comment is channel wise group
@xiamengzhou [batch=592/3200] Train time/batch: 591 Train time/sample: 18912 Train time/batch_in_epoch: 591 Train time/sample_in_epoch: 18912 Train time/token: 77463552 Train time/token_in_epoch: 77463552 Train metrics/train/cc_weight: 0.2292 Train metrics/train/github_weight: 0.0121 Train metrics/train/book_weight: 0.0220 Train...