fuzzy_pandas
fuzzy_pandas copied to clipboard
object of type 'float' has no len() fuzzy_merge
Hi, I match two Chinese firm databases using the package. Here is my code:
_import pandas as pd import fuzzy_pandas as fpd import dask.dataframe as dd
company_names = 'C:/Users/acemec/Documents/firm_data/company_annual.csv'
new_companies_name = 'C:/Users/acemec/Documents/firm_data/Pat_firm_list.csv'
mylist = []
for chunk in pd.read_csv(company_names, on_bad_lines='skip', encoding='Latin-1', dtype=object, low_memory=False, chunksize=200000): mylist.append(chunk)
companies = pd.concat(mylist, axis = 0) del mylist
mylist = []
for chunk in pd.read_csv(new_companies_name, on_bad_lines='skip', encoding='Latin-1', dtype=object, low_memory=False, chunksize=200000): mylist.append(chunk)
new_companies = pd.concat(mylist, axis= 0) del mylist
match = fpd.fuzzy_merge(new_companies, companies, left_on=['assignee'], right_on=['company_name'], keep_left=['assignee'], keep_right = ['company_name', 'tyc_id', 'company_id'], method='levenshtein', threshold=0.85)
df = pd.DataFrame(match) df.to_csv('C:/Users/acemec/Documents/firm_data/match_reslts.csv', encoding='utf-8')__
And I find some errors:
object of type 'float' has no len() fuzzy_merge
Could you give me some suggestions? Thx.
+1
Something that helped for me when getting this error (not in this package, but in difflib, which also does fuzzy matching) was to add .astype(str) after the column designations. [column_name].astype(str)