disambiguation icon indicating copy to clipboard operation
disambiguation copied to clipboard

Some Questions in preprocessing.

Open wzc118 opened this issue 6 years ago • 0 comments

Thxs for this code. I have some questions for preprocessing logically.

  1. For the 'author_features.txt' output, why there's a record with document_id-author_order,and why to remove the name according to the author_order.
  2. Why the org feature output like this?Why not split for words. __ORG__nankai_univ __ORG__nankaiuniv __ORG__nankaiuniv __ORG__nankaiuniv
  3. In 'dump_author_features_to_file', We ignore the record for len(paper["authors"]) > 100, if I want got the embedding for this record, how to handle it.

wzc118 avatar Nov 24 '18 13:11 wzc118