Andy T. Liu comments

Results 14 comments of


Andy T. Liu

Downstream ASR Custom dataset

Hello, I believe you can fit Common Voice to our downstream ASR dataset by satisfying this requirement: https://github.com/s3prl/s3prl/blob/540a4f86e9f099b240c314a56ff80b069e91d4cd/s3prl/downstream/asr/dataset.py#L46-L47 The usage is pretty straightforward, you just have to define the sub-directory...

Loading finetuned checkpoint. Local method not working.

> I use checkpoints trained recently with downstream tasks. The directory of downstream experiment looks like: > > And I use any of `states-200000.ckpt`, `best-states-test.ckpt`, `best-states-dev.ckpt`. > Such checkpoint has...

How to extend layer numbers of AALBERT?

Hi @myhrbeu, I'm not sure if I understand your question correctly, but you can change the number of layers by changing this line: https://github.com/s3prl/s3prl/blob/e52439edaeb1a443e82960e6401ae6ab4241def6/s3prl/pretrain/audio_albert/config_model.yaml#L4 For example: ```yaml num_hidden_layers: 7 ```...

How to extend layer numbers of AALBERT?

> I found it is very slow when having 7 layers This is expected and normal. Because sharing layers (sharing weights across layers) won't make the model forward faster. you...

TransformerModel inference (possible bug)

Hello, Apparently, #L463 needs to be fixed. However, our repo was not using #L463. This is how we prepare the transformer input: https://github.com/s3prl/s3prl/blob/ccf621e78edb80534ba32a86b5f7077f84b9a6fd/s3prl/upstream/mockingjay/builder.py#L178-L189 There are some more details that you...

TransformerModel inference (possible bug)

and we will fix [#L463](https://github.com/s3prl/s3prl/blob/ccf621e78edb80534ba32a86b5f7077f84b9a6fd/s3prl/upstream/mockingjay/model.py#L463) in our next release!

Support for train / valid split in pretraining

Hello! That sounds nice, we will be supporting validation loss for pertaining in our next version. It has been implemented but is not yet ready to release, because we have...

Questions of calculating the loss of padded part when model performed pre-training

Following from issue https://github.com/s3prl/s3prl/issues/342, it looks like you have modified our code, for: 1) Our [dataset](https://github.com/s3prl/s3prl/blob/master/s3prl/pretrain/bucket_dataset.py) does bucketing, so that in each batch all the utterances have the same length...

Questions of calculating the loss of padded part when model performed pre-training

> I think this problem was caused by the old version because I downloaded s3prl last October, and only modified some interesting parts. > Further, in the last issue #342,...

Questions of calculating the loss of padded part when model performed pre-training

> So the pre-trained model is performed until `1,500` sequences, right? This is partially correct. We will randomly sample a sub-sequence from the whole sequence: https://github.com/s3prl/s3prl/blob/6e4d75b4e149f662bb419a154109f0ac663f958d/s3prl/pretrain/bucket_dataset.py#L80-L84 so it is not...