fuzzy_pandas icon indicating copy to clipboard operation
fuzzy_pandas copied to clipboard

object of type 'float' has no len() fuzzy_merge

Open ZhihaoMa opened this issue 3 years ago • 2 comments

Hi, I match two Chinese firm databases using the package. Here is my code:

_import pandas as pd import fuzzy_pandas as fpd import dask.dataframe as dd

company_names = 'C:/Users/acemec/Documents/firm_data/company_annual.csv'

new_companies_name = 'C:/Users/acemec/Documents/firm_data/Pat_firm_list.csv'

mylist = []

for chunk in pd.read_csv(company_names, on_bad_lines='skip', encoding='Latin-1', dtype=object, low_memory=False, chunksize=200000): mylist.append(chunk)

companies = pd.concat(mylist, axis = 0) del mylist

mylist = []

for chunk in pd.read_csv(new_companies_name, on_bad_lines='skip', encoding='Latin-1', dtype=object, low_memory=False, chunksize=200000): mylist.append(chunk)

new_companies = pd.concat(mylist, axis= 0) del mylist

match = fpd.fuzzy_merge(new_companies, companies, left_on=['assignee'], right_on=['company_name'], keep_left=['assignee'], keep_right = ['company_name', 'tyc_id', 'company_id'], method='levenshtein', threshold=0.85)

df = pd.DataFrame(match) df.to_csv('C:/Users/acemec/Documents/firm_data/match_reslts.csv', encoding='utf-8')__

And I find some errors:

object of type 'float' has no len() fuzzy_merge

Could you give me some suggestions? Thx.

ZhihaoMa avatar Oct 23 '21 11:10 ZhihaoMa

+1

kanlancb avatar Apr 26 '22 10:04 kanlancb

Something that helped for me when getting this error (not in this package, but in difflib, which also does fuzzy matching) was to add .astype(str) after the column designations. [column_name].astype(str)

maxdorman avatar Aug 21 '24 19:08 maxdorman