openai-cookbook
openai-cookbook copied to clipboard
[PROBLEM]Clustering.ipynb: if I do not set "random_state=42" in sampling, openai seems not to be able to differentiate the reviews
[optional format]
Identify the file to be fixed Clustering.ipynb.
Describe the problem reviews = "\n".join( df[df.Cluster == i] .combined.str.replace("Title: ", "") .str.replace("\n\nContent: ", ": ") .sample(rev_per_cluster) .values ) if I do not specify "random_state=42" in constituting the reviews, then the response may be very similar to each other. See results from my test: Cluster 0 Theme: All of the reviews are positive and the customers are satisfied with their purchase. Cluster 1 Theme: All of the reviews are positive and discuss the quality of the product and how it works for the customer's pet. Cluster 2 Theme: All of the reviews are positive and express satisfaction with the product. Cluster 3 Theme: All of the reviews are positive and express satisfaction with the product. You see, response to Cluster 2 and 3 Theme are the same and very similar to Cluster 0 Theme. I don't know whether I can call it an issue, but it may suggest that the clustering may have no major distinction from the perspective of reviews.
Describe a solution A clear and concise description of what a fixed version should do.
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Add any other context about the problem here.