spark-movie-lens icon indicating copy to clipboard operation
spark-movie-lens copied to clipboard

duplicates of new user unrated moves passed to predict

Open snowch opened this issue 7 years ago • 0 comments

new_user_unrated_movies_RDD = (complete_movies_data.filter(lambda x: x[0] not in new_user_ratings_ids).map(lambda x: (new_user_ID, x[0])))

The list of unrated movies contains duplicates:

print(new_user_unrated_movies_RDD.take(10))
[(0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1)]

Should there be a distinct added?

new_user_unrated_movies_RDD = (complete_movies_data.filter(lambda x: x[0] not in new_user_ratings_ids).map(lambda x: (new_user_ID, x[0]))).distinct()
print(new_user_unrated_movies_RDD.take(10))
[(0, 378), (0, 1934), (0, 3282), (0, 5606), (0, 862), (0, 2146), (0, 3766), (0, 1330), (0, 2630), (0, 4970)]

The predict function that receives new_user_unrated_movies_RDD:

# Use the input RDD, new_user_unrated_movies_RDD, with new_ratings_model.predictAll() to predict new ratings for the movies
new_user_recommendations_RDD = new_ratings_model.predictAll(new_user_unrated_movies_RDD)

snowch avatar Oct 05 '16 12:10 snowch