openai-cookbook icon indicating copy to clipboard operation
openai-cookbook copied to clipboard

[PROBLEM]Clustering.ipynb: if I do not set "random_state=42" in sampling, openai seems not to be able to differentiate the reviews

Open Jessen-Li opened this issue 1 year ago • 0 comments

[optional format]

Identify the file to be fixed Clustering.ipynb.

Describe the problem reviews = "\n".join( df[df.Cluster == i] .combined.str.replace("Title: ", "") .str.replace("\n\nContent: ", ": ") .sample(rev_per_cluster) .values ) if I do not specify "random_state=42" in constituting the reviews, then the response may be very similar to each other. See results from my test: Cluster 0 Theme: All of the reviews are positive and the customers are satisfied with their purchase. Cluster 1 Theme: All of the reviews are positive and discuss the quality of the product and how it works for the customer's pet. Cluster 2 Theme: All of the reviews are positive and express satisfaction with the product. Cluster 3 Theme: All of the reviews are positive and express satisfaction with the product. You see, response to Cluster 2 and 3 Theme are the same and very similar to Cluster 0 Theme. I don't know whether I can call it an issue, but it may suggest that the clustering may have no major distinction from the perspective of reviews.

Describe a solution A clear and concise description of what a fixed version should do.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

Jessen-Li avatar May 23 '23 08:05 Jessen-Li