Sheng Zha
Sheng Zha
Agreed. There's a large body of work in that space, both in extractive and nowadays abstractive summarization. I think we should survey and pick. Some good comprehensive pools to choose...
there are some new ELECTRA pre-trained models in Chinese that seem useful: https://github.com/ymcui/HFL-Anthology#hfl-anthology
It seems to be kept consistent with TransformerNMTInference. I'd suggest to keep the naming convention consistent for easier discovery, so any change should happen to both classes. In terms of...
I think we can add targets as the following: `cu102` : mxnet-cu102 release package `dev-cu102`: mxnet-cu102 nightly package similarly for cpu, cu92, cu100
@craffel thanks for reporting. The above PRs should fix the problem. The correct dataset name is `distilbert_book_corpus_wiki_en_uncased`
the data source "smashwords" has a term of service that prohibits redistribution. neither in the links above nor in https://github.com/soskek/bookcorpus/issues/27 was there any mention of getting approval from smashwords or...
> There is no legal risk linking to the dataset In the US there's recognition of the secondary infringement liability. One can be found guilty for affirmative encouragement or inducing...
@andreas-solti the mapping looks correct. I don't think there's a need to transpose the weight. The embedding weight indices need to shuffle because: > The bos and eos token ids...
I'm verified conversion scripts with https://github.com/dmlc/gluon-nlp/blob/master/tools/batch/run_batch_conversion.sh on batch. Success: mobilebert, electra, albert Failure: bart (`'BARTHubInterface' object has no attribute 'args'`), xlmr and roberta (`'RobertaHubInterface' object has no attribute 'args'`), bert...
turns out fairseq has changed `.args` to `.cfg` https://github.com/pytorch/fairseq/blob/f3d5045a71ae463bd3f05254d7c4216801a04bc2/fairseq/hub_utils.py#L93