Junlei Zhang comments

Results 38 comments of


                                            Junlei Zhang

support for MBART (big models)?

Currently, it looks that the tool do not support models exceeding 2GB

support for MBART (big models)?

@Taka152 Hello, thank you for your reply. I am trying to accelerate the MBart Model. But the model size is too large. Could the main branch solve the issue as...

support for MBART (big models)?

initializing bart tokenizer... creating lightseq model... Parsing hdf5: /home/sysadmin/downlaod/lightseq_models/lightseq_mbart_base.hdf5 loading 976 MB of embedding weight. Finish loading src_emb_wei from host to device loading 1073 MB of embedding weight. Finish loading...

Support for mbart models?/ Could we get the output logits of decoders before the beam search?

@byshiue Thank you for your reply. If I just use the python interface, could I get the logits output of the decoder?

Support for mbart models?/ Could we get the output logits of decoders before the beam search?

@byshiue Thank you. I will try to implement the mbart model

can not reproduce the results

The acc for the network1 is almost the same

can not reproduce the results

Thank you for your reply. I changed two things: 1. I set the batch_size for train_loader and valid_loader the same; 2. I set the random seed. Otherwise, it is hard...

Distributed pretraining dataset question

> Hello, if you simply fix this error by setting global_bank=0, 7/8 of your dataset will not be trained in an epoch. And if all workers work on the same...

ImportError: cannot import name 'ShardedDDPOption'

I found that there is no ShardedDDPOption in transformer 4.2.1. Is there any error for the version?

What are the reduce_ratios and y in your VIG code?

> Thanks for the attention. Here we reduce the number of nodes by `reduce_ratio` so as to reduce the computational cost of distance calculation. Thank you for your rapid reply....