openai-cookbook icon indicating copy to clipboard operation
openai-cookbook copied to clipboard

df.ada_similarity.apply(eval).apply(np.array) is returning an error

Open yzvickie opened this issue 2 years ago • 4 comments
trafficstars

I'm getting an error when running the line df["ada_similarity"] = df.ada_similarity.apply(eval).apply(np.array) from example https://github.com/openai/openai-cookbook/blob/main/examples/Clustering.ipynb. The error I'm getting is:

eval() arg 1 must be a string, bytes or code object

Full error: tmp/ipykernel_45192/3289201929.py in 2 import numpy as np 3 ----> 4 df["ada_similarity"] = df.ada_similarity.apply(eval).apply(np.array) 5 matrix = np.vstack(df.ada_similarity.values) 6 matrix.shape

/apps/python3/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwargs) 4355 dtype: float64 4356 """ -> 4357 return SeriesApply(self, func, convert_dtype, args, kwargs).apply() 4358 4359 def _reduce(

/apps/python3/lib/python3.7/site-packages/pandas/core/apply.py in apply(self) 1041 return self.apply_str() 1042 -> 1043 return self.apply_standard() 1044 1045 def agg(self):

/apps/python3/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self) 1099 values, 1100 f, # type: ignore[arg-type] -> 1101 convert=self.convert_dtype, 1102 ) 1103

/apps/python3/lib/python3.7/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

TypeError: eval() arg 1 must be a string, bytes or code object

yzvickie avatar Dec 21 '22 22:12 yzvickie

This can be fixed via df["ada_similarity"] = df.ada_similarity.apply(eval).apply(np.array).apply(lambda x: x.astype(float))

Also, kmeans = KMeans(n_clusters=n_clusters, init="k-means++", random_state=42, n_init='auto') doesn't work, should be: kmeans = KMeans(n_clusters=n_clusters, init="k-means++", random_state=42, n_init=10)

yzvickie avatar Dec 21 '22 23:12 yzvickie

@yzvickie Thanks for sharing! Although I think the first line has to be

df["ada_similarity"] = df.apply(lambda x: x.astype(float)).apply(np.array)

Otherwise ya might get the same error

NishqR avatar Dec 28 '22 05:12 NishqR

Thanks! Will fix.

ted-at-openai avatar Jan 09 '23 18:01 ted-at-openai

Looks like n_init='auto' requires scikit-learn 1.2.0+. I'll omit it so that it just uses the default for whichever version of scikit-learn folks are using.

ted-at-openai avatar Jan 09 '23 18:01 ted-at-openai

I think this is now fixed (#66). The code runs for me. Let me know if it's still throwing an error for you. If so, I'd guess it's from using a different version of one of these libraries. Happy to look into it further if it's still presenting difficulties.

ted-at-openai avatar Jan 10 '23 18:01 ted-at-openai