Samuele Cornell comments

Results 47 comments of


Samuele Cornell

The result of inference using EncoderDecoderASR differs from transcribed text in wer_eval_clean.txt file.

https://github.com/speechbrain/speechbrain/blob/801b1501b0bde2a940fcb71af44b69b07eafb9f5/speechbrain/pretrained/interfaces.py#L634 The problem I think it is because the `EncoderDecoderASR` object `transcribe` method does not anymore rely on the Transformer `decode` method. This bypasses the pos encoding code.

The result of inference using EncoderDecoderASR differs from transcribed text in wer_eval_clean.txt file.

Yes only tgt sequence now has pos encoding in the decoder. So I think now it is fine ? https://github.com/speechbrain/speechbrain/blob/8fc31edc763e5b8860600ca806ff7c1575bc6aeb/speechbrain/lobes/models/transformer/TransformerASR.py#L396C20-L396C37

Lhotse support for espnet-ez by class inheritance

You are right. It would be neat if we can use lhotse dataloaders as this could also allow for using sharded datasets for large scale training. Maybe it is fine...

The distance between the sample and its nearest negative center is not computed at all.

If i recall correctly it should be used only to derive the mask. Basically by using then reduce_min np.infs are not considered and only the nearest examples should be considered....

The distance between the sample and its nearest negative center is not computed at all.

It is been a while since I don't code in tensorflow/keras having switched to pytorch. It seems to me that the triplet_center_loss function takes in input the l2_loss between example...

The distance between the sample and its nearest negative center is not computed at all.

Yes good catch, I think it is more straightforward to have the loss function directly have in input embedding, centroid and label.

triplet_center_loss:TypeError

Can you provide the full traceback ?