mldl@mldlUB1604:~/ub16_prj/gh-shanzhenren/ReMine$ bash train.sh
===Entity Linking===
Traceback (most recent call last):
File "src_py/distantSupervision.py", line 28, in
utils.getEntity(args.in1, args.out, args.opt)
File "/home/mldl/ub16_prj/gh-shanzhenren/ReMine/src_py/utils.py", line 29, in getEntity
with open(file_path) as IN, open(output, 'w') as OUT:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/train_nyt.json'
((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+((NN.{0,2}|JJ.{0,1}|RB.{0,1}|PRP.{0,1}|DT ))+((IN|RP) +)|((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+((IN|RP) +)|((VB|VBD|VBG|VBN|VBN|VBP|VBZ) )+|(NN.{0,2})+((IN|RP) +)
Traceback (most recent call last):
File "src_py/distantSupervision.py", line 24, in
utils.relationLinker(args.in1, args.in2)
File "/home/mldl/ub16_prj/gh-shanzhenren/ReMine/src_py/utils.py", line 72, in relationLinker
with open(file_path,'r') as IN:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/train_nyt.json'
===Tokenizaztion===
Traceback (most recent call last):
File "src_py/preprocessing.py", line 396, in
tmp.tokenized_train(args.in1, args.in2, args.in3)
File "src_py/preprocessing.py", line 125, in tokenized_train
with open(docIn, encoding='utf-8') as doc, open(posIn, encoding='utf-8') as pos, open(depIn, encoding='utf-8') as dep:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/total.lemmas.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 404, in
tmp.chunk_train(args.in1, args.in2)
File "src_py/preprocessing.py", line 51, in chunk_train
with open(docIn, encoding='utf-8') as doc, open(posIn, encoding='utf-8') as pos, open('tmp_remine/boost_patterns.txt', 'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nyt/total.lemmas.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'data/stopwords.txt'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nyt.entities'
Traceback (most recent call last):
File "src_py/preprocessing.py", line 407, in
tmp.tokenize(args.in1, args.out)
File "src_py/preprocessing.py", line 222, in tokenize
with open(docIn, encoding='utf-8') as doc, open(docOut,'w', encoding='utf-8') as out:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/nyt.relations'
regarding the size of training corpus, I will upload it to google drive/dropbox and keep you posted. Also, we will release server version and pre-train model later(we are working on it). Thanks!
After the data releases, can you put the link on the repositories' readme or post it to me? Thank you! @GentleZhu