uspto-patent-data-parser
uspto-patent-data-parser copied to clipboard
Suggestion about func "read_and_parse_txt_from_disk"
Moreover, I suggest this func should be changed like this, because I meet the encoding problem:
def read_and_parse_txt_from_disk(path_to_file,data_items):
try:
with open(path_to_file,'r',encoding='utf-8') as f:
txt = f.read()
except:
with open(path_to_file,'r',encoding='latin1') as f:
txt = f.read()
txt = txt.split('\n')
raw_patent_data= get_patents_list(txt)
parsed_data = []
for patent in raw_patent_data:
parsed_data.append(parse_txt_patent_data(patent,data_items_list = data_items))
return parsed_data