Samuele Cornell

Results 47 comments of Samuele Cornell

https://github.com/speechbrain/speechbrain/blob/801b1501b0bde2a940fcb71af44b69b07eafb9f5/speechbrain/pretrained/interfaces.py#L634 The problem I think it is because the `EncoderDecoderASR` object `transcribe` method does not anymore rely on the Transformer `decode` method. This bypasses the pos encoding code.

Yes only tgt sequence now has pos encoding in the decoder. So I think now it is fine ? https://github.com/speechbrain/speechbrain/blob/8fc31edc763e5b8860600ca806ff7c1575bc6aeb/speechbrain/lobes/models/transformer/TransformerASR.py#L396C20-L396C37

You are right. It would be neat if we can use lhotse dataloaders as this could also allow for using sharded datasets for large scale training. Maybe it is fine...

If i recall correctly it should be used only to derive the mask. Basically by using then reduce_min np.infs are not considered and only the nearest examples should be considered....

It is been a while since I don't code in tensorflow/keras having switched to pytorch. It seems to me that the triplet_center_loss function takes in input the l2_loss between example...

Yes good catch, I think it is more straightforward to have the loss function directly have in input embedding, centroid and label.

Can you provide the full traceback ?