Alexander
Alexander
TRTorch now can only compile and keep forward method of torchscript. It is not transparency to users. If users exports other methods in torchscript, and their program is based on...
I review the fan.py, find the arch of fan is different from officials, what makes you do this?
(aspect_ratios_.size()+1) is a typo ? I think right equation is : num_priors_ += aspect_ratios_.size() * (pow(densitys_[i],2)-1)
I add distillation when training resnet18. But the Top-1 Acc degrades from 68.150 % to 67.364%。 Hyperparameters as follow: 4gpu epochs: 90 learning_rate: 0.01 momentum: 0.9 weight_decay: 0.0001 mode: step...
只修改这两行代码,LLaMa-2-7B 模型无法得到正确的输出。 ` Generate(kernels_); group_sizes_.push_back(64);` FP16 的输出: @Input: The first time I saw the movie, I was like, 'Oh my God, _Output: this is so cool.' I was like, INT4...
In file losses.py `grouped_sum = tf.sqrt(tf.reduce_sum(tf.pow(W,2),axis=[0,1,2]))` I think it's filter wise group, not input channel wise group, but the comment is channel wise group
@xiamengzhou [batch=592/3200] Train time/batch: 591 Train time/sample: 18912 Train time/batch_in_epoch: 591 Train time/sample_in_epoch: 18912 Train time/token: 77463552 Train time/token_in_epoch: 77463552 Train metrics/train/cc_weight: 0.2292 Train metrics/train/github_weight: 0.0121 Train metrics/train/book_weight: 0.0220 Train...