UER-py icon indicating copy to clipboard operation
UER-py copied to clipboard

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

Results 125 UER-py issues
Sort by recently updated
recently updated
newest added

File "run_kbert_cls.py", line 261, in main model.load_state_dict(torch.load(args.pretrained_model_path), strict=False) File "/opt/conda/envs/phchen-k/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Model: size mismatch...

Traceback (most recent call last): File "convert_albert_from_huggingface_to_uer.py", line 43, in output_model["target.sop_linear_1.weight"] = input_model["albert.pooler.weight"] KeyError: 'albert.pooler.weight' 您好, 我在使用代碼轉換hugging face model hub 之 voidful/albert_chinese_base (https://huggingface.co/voidful/albert_chinese_base )時,遇到了上述問題,想問問題出在哪?萬分感謝您

``` CUDA_VISIBLE_DEVICES=0,1 /dockerdata/anaconda3-2/bin/python run_classifier.py --vocab_path models/google_zh_vocab.txt \ --config_path models/gatedcnn_9_config.json \ --train_path datasets/chnsenticorp/train.tsv --dev_path datasets/chnsenticorp/dev.tsv --test_path datasets/chnsenticorp/test.tsv \ --learning_rate 1e-4 --batch_size 64 --epochs_num 5 \ --embedding word --remove_embedding_layernorm --encoder gatedcnn --pooling...

bug

如题,现在只有中文的。谢谢。

The pre-trained weights you provide on https://huggingface.co/uer are helpful. Do you have any plans to release these weights in UER format ?

现在模型的使用都要用args进行配置,能否把这个东西包装成类呢? 比如说我有pandas对象直接要调用模型,还要从args里面解析出tokenizer和其它配置 重新初始化模型,并自行完成分词和其它工作。 因为现在的接口都是针对文件的,希望能直接针对对象,使用比较方便。

enhancement

I try to use the **MixedCorpus+GptEncoder+LmTarget** model for generate some text like GPT2. And I followed the example scripts like ``` python3 scripts/generate.py --pretrained_model_path models/gpt_model.bin --vocab_path models/google_zh_vocab.txt --input_path story_beginning.txt --output_path...

`from transformers import BertTokenizer, BertModel tokenizer = BertTokenizer.from_pretrained('./chinese_roberta_L-12_H-128/vocab.txt') model = BertModel.from_pretrained("./chinese_roberta_L-12_H-128/config.json") text = "用你喜欢的任何文本替换我。" encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) print(output)` 报错内容: `Calling BertTokenizer.from_pretrained() with the path to a...

当前环境的版本是: `import torch import transformers print(torch.__version__) # 1.7.1 print(transformers.__version__) #3.4.0 ` 加载UER的预训练模型:‘uer/chinese_roberta_L-8_H-512’,会报出没有该模型的错误。

您好 我看了一下run_dbqa.py的代码,有一些代码不太理解。 dataset_groupby_qid.append((qid, correct_answer_orders, scores)) 是不是应该加一个缩进,到循环内执行。 dataset_groupby_qid而且类型是一个list 并没有生成dataframe并执行groupby. 在这个任务中,一般text_a的不同已经能够区分是不同的“问”的排序了,还加了个qid, 这样也有一个好处,就是可以将若干类似的句子合并成一个样本(因为groupby) 但这看似仅仅影响evaluate过程。(是否进行合并) 不知道代码是否需要改一下。