Jersey
Jersey
@Youggls In Lightseq, `layernorm_embedding` weight is stored as `self_attn_layer_norm` in the first encoder(decoder) layer; while the `norm_scale` and `norm_bias` stored in EmbeddingLayer represent the "en(de)coder output layernorm" for pre-norm, or...
@Youggls For now this model is not supported. For BART, `layernorm_embedding` operates in [self-attention of the first layer](https://github.com/bytedance/lightseq/blob/master/lightseq/inference/model/encoder.cc.cu#L170), while the kernel `ker_norm_layer_resual_launcher` doesn't support layernorm pointers to be `nullptr` for...