recommenders
recommenders copied to clipboard
How to add more features to the item_model
I added new features to the user model according to the method in the tutorial, but there are errors when using the same method to add new features to the item model. What is the reason, or how to add more item features to the item model. The code is as follows: user_model:
`class UserModel(tf.keras.Model):
def __init__(self):
super().__init__()
# user_embedding用户id层
self.user_embedding = tf.keras.Sequential( [
tf.keras.layers.experimental.preprocessing.StringLookup(
vocabulary=unique_user_ids, mask_token = None),
tf.keras.layers.Embedding(len(unique_user_ids) + 1, 32),
])
# 时间戳特征层
self.timestamp_embedding = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.Discretization(timestamp_buckets.tolist()),
tf.keras.layers.Embedding(len(timestamp_buckets) + 1, 32),
])
self.normalized_timestamp = tf.keras.layers.experimental.preprocessing.Normalization()
self.normalized_timestamp.adapt(bhv_time)
# 购买能力特征
self.normalized_age = tf.keras.layers.experimental.preprocessing.Normalization()
self.normalized_age.adapt(bhv_value)
def call(self, inputs):
# 输入为字典类型
return tf.concat([
self.user_embedding(inputs['user_id']),
self.timestamp_embedding(inputs['bhv_time']),
self.normalized_timestamp(inputs['bhv_time']),
self.normalized_age(inputs['bhv_value']),
], axis = 1)`
item_model:
`class ItemModel(tf.keras.Model):
def __init__(self):
super().__init__()
max_tokens = 1000 # 最大标签数
self.title_embedding = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.StringLookup(
vocabulary = unique_item_titles, mask_token = None),
tf.keras.layers.Embedding(len(unique_item_titles) + 1, 32),
])
self.title_vectorizer = tf.keras.layers.experimental.preprocessing.TextVectorization(
max_tokens = max_tokens) # 文本转换为向量
self.title_text_embedding = tf.keras.Sequential([
self.title_vectorizer,
tf.keras.layers.Embedding(max_tokens, 32, mask_zero = True),
tf.keras.layers.GlobalAveragePooling1D(), # 全局均值池化
])
# self.title_vectorizer.adapt(titles)
def call(self, inputs):
# 输入为字典类型
return tf.concat([
self.title_embedding(inputs['item_id']),
self.title_text_embedding(inputs['title']),
], axis = 1)`
`class ItemlensModel(tfrs.models.Model):
def __init__(self,):
super().__init__()
# 查询模型
# self.query_model = UserModel()
self.query_model = tf.keras.Sequential([
UserModel(),
tf.keras.layers.Dense(32)
],name = 'query_name')
# 候选者模型
# self.candidate_model = ItemModel()
self.candidate_model = tf.keras.Sequential([
ItemModel(),
tf.keras.layers.Dense(32)
])
# 任务
self.task = tfrs.tasks.Retrieval(
metrics = tfrs.metrics.FactorizedTopK(
# candidates = items.batch(128).map(self.candidate_model),
candidates = items.batch(128).map(self.candidate_model),
)
)
# 计算损失函数
def compute_loss(self, features, training = False):
query_embeddings = self.query_model({
'user_id': features['user_id'],
'bhv_time': features['bhv_time'],
'bhv_value': features['bhv_value'],
})
item_embeddings = self.candidate_model({
'item_id': features['item_id'],
'title': features['title'],
})
return self.task(query_embeddings, item_embeddings)`
There is no problem with the user model part, and the following error occurs in the item model part:
I also ran into this issue, and managed to fix it by making items
a dict
like:
items = items.map(lambda x: {
"item_id": x['item_id'],
"item_title": x['item_title'],
}).cache()
I've solved this problem,I have solved this problem. The main problem lies in the early data format. The commodity data needs to be converted into a dictionary first, and then into a dataset (if your data format is dataframe)
data_item = pd.read_csv('../data/item_title_new.csv', nrows = 10000, encoding = 'utf-8')
# 商品属性特征, 数据格式转换,DF->dataset
items = tf.data.Dataset.from_tensor_slices(dict(data_item))
titles = tf.data.Dataset.from_tensor_slices((data_item['title']))
@markharding
I've solved this problem,I have solved this problem. The main problem lies in the early data format. The commodity data needs to be converted into a dictionary first, and then into a dataset (if your data format is dataframe)
data_item = pd.read_csv('../data/item_title_new.csv', nrows = 10000, encoding = 'utf-8') # 商品属性特征, 数据格式转换,DF->dataset items = tf.data.Dataset.from_tensor_slices(dict(data_item)) titles = tf.data.Dataset.from_tensor_slices((data_item['title']))
@markharding
How did you solve? I have same problem using usermodel, I try to add location feature and get error
@deeplearningnrs 方便指导一下吗,
我在item侧增加了特征以后,evaluate的矩阵就完全不起作用了,你遇到过类似的情况嘛
yes it didnt work either
I also ran into this issue, and managed to fix it by making
items
adict
like:items = items.map(lambda x: { "item_id": x['item_id'], "item_title": x['item_title'], }).cache()
yes, i solved it that way too. items
needs to have same variables that item tower.
Hi @hugoferrero I tried your suggestion but didn't worked for me, may be I understanding is wrong. Can you please elaborate your suggestion and give more explain with code.