lightfm
lightfm copied to clipboard
ValueError: The user feature matrix specifies more features than there are estimated feature embeddings: 19400 vs 81728.
I have the following datasets:
Users: 10.000 Rows. Features: User-Id, name, age, los, ou, gender, skills, language, grade, career interests
Trainings: Training-Id, training name, main skill
Trainings Taken User-Id, Training-Id, TrainingTaken TrainingTaken will be a 10 when the user took the training, otherwise it wont appear in the dataset
The idea is to make a recommneder for trainings :)
I used this helper class for the matrices. https://github.com/Med-ELOMARI/LightFM-Dataset-Helper
``from lightfm_dataset_helper.lightfm_dataset_helper import DatasetHelper`
I defined the feature columns for user and trainings.
items_column = "Training-Id"
user_column = "User-Id"
ratings_column = "TrainingTaken"
items_feature_columns = [
"training name",
"main skill"
]
user_features_columns = ["name","age","los","ou", "gender", "skills", "language", "grade", "career interests"]
Then I build the matrices
dataset_helper_instance = DatasetHelper(
users_dataframe=usersdf,
items_dataframe=trainingsdf,
interactions_dataframe=trainingstakendf,
item_id_column=items_column,
items_feature_columns=items_feature_columns,
user_id_column=user_column,
user_features_columns=user_features_columns,
interaction_column=ratings_column,
clean_unknown_interactions=True,
)
dataset_helper_instance.routine()
Then I train:
from lightfm import LightFM
from lightfm.cross_validation import random_train_test_split
(train, test) = random_train_test_split(interactions=dataset_helper_instance.interactions, test_percentage=0.2)
model = LightFM(loss='warp')
model.fit(
interactions=dataset_helper_instance.interactions,
sample_weight=dataset_helper_instance.weights,
item_features=dataset_helper_instance.item_features_list,
user_features=dataset_helper_instance.user_features_list,
verbose=True,
epochs=20,
num_threads=20,
)
then I try to use the predict
import numpy as np
from lightfm.data import Dataset
#predict existing users
scores = model.predict(user_ids=81727, item_ids=[1])
print(scores)
However I am getting this error:
ValueError: The user feature matrix specifies more features than there are estimated feature embeddings: 19400 vs 81728.
what could be wrong?
Hey,
I'd recommend trying with the built-in methods for creating the data. I cannot speak to the correctness of the helper package (be aware that the last commit to that package was 2 yrs ago).
Fyi, you already created a related issue here https://github.com/lyst/lightfm/issues/638#issuecomment-1072432868