uspto-patent-data-parser icon indicating copy to clipboard operation
uspto-patent-data-parser copied to clipboard

Suggestion about func "read_and_parse_txt_from_disk"

Open GengYuIsland opened this issue 9 months ago • 0 comments

Moreover, I suggest this func should be changed like this, because I meet the encoding problem:

def read_and_parse_txt_from_disk(path_to_file,data_items):
    try:
        with open(path_to_file,'r',encoding='utf-8') as f:
            txt = f.read()
    except:
        with open(path_to_file,'r',encoding='latin1') as f:
            txt = f.read()
    txt = txt.split('\n')
    raw_patent_data= get_patents_list(txt)
    parsed_data = []
    for patent in raw_patent_data:
        parsed_data.append(parse_txt_patent_data(patent,data_items_list = data_items))
    return parsed_data

GengYuIsland avatar May 10 '24 17:05 GengYuIsland