disambiguation
disambiguation copied to clipboard
Some Questions in preprocessing.
Thxs for this code. I have some questions for preprocessing logically.
- For the 'author_features.txt' output, why there's a record with document_id-author_order,and why to remove the name according to the author_order.
- Why the org feature output like this?Why not split for words.
__ORG__nankai_univ __ORG__nankaiuniv __ORG__nankaiuniv __ORG__nankaiuniv
- In 'dump_author_features_to_file', We ignore the record for
len(paper["authors"]) > 100
, if I want got the embedding for this record, how to handle it.