recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

How to add more features to the item_model

Open Jobo-RS opened this issue 3 years ago • 7 comments

I added new features to the user model according to the method in the tutorial, but there are errors when using the same method to add new features to the item model. What is the reason, or how to add more item features to the item model. The code is as follows: user_model:

`class UserModel(tf.keras.Model):

def __init__(self):
    super().__init__()
    
    # user_embedding用户id层
    self.user_embedding = tf.keras.Sequential( [
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary=unique_user_ids, mask_token = None),
        tf.keras.layers.Embedding(len(unique_user_ids) + 1, 32),
    ])
    # 时间戳特征层
    self.timestamp_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.Discretization(timestamp_buckets.tolist()),
        tf.keras.layers.Embedding(len(timestamp_buckets) + 1, 32),
        ])
    
    self.normalized_timestamp = tf.keras.layers.experimental.preprocessing.Normalization()
    self.normalized_timestamp.adapt(bhv_time)

    # 购买能力特征
    self.normalized_age = tf.keras.layers.experimental.preprocessing.Normalization()
    self.normalized_age.adapt(bhv_value)

def call(self, inputs):
    # 输入为字典类型
    return tf.concat([
        self.user_embedding(inputs['user_id']),
        self.timestamp_embedding(inputs['bhv_time']),
        self.normalized_timestamp(inputs['bhv_time']),
        self.normalized_age(inputs['bhv_value']),
    ], axis = 1)`

item_model:

`class ItemModel(tf.keras.Model):

def __init__(self):
    super().__init__()
    
    max_tokens = 1000 # 最大标签数

    self.title_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary = unique_item_titles, mask_token = None),
        tf.keras.layers.Embedding(len(unique_item_titles) + 1, 32),
    ])

    self.title_vectorizer = tf.keras.layers.experimental.preprocessing.TextVectorization(
        max_tokens = max_tokens) # 文本转换为向量

    self.title_text_embedding = tf.keras.Sequential([
        self.title_vectorizer,
        tf.keras.layers.Embedding(max_tokens, 32, mask_zero = True),
        tf.keras.layers.GlobalAveragePooling1D(), # 全局均值池化
    ])
    # self.title_vectorizer.adapt(titles)
    
def call(self, inputs):
    # 输入为字典类型
    return tf.concat([
        self.title_embedding(inputs['item_id']),
        self.title_text_embedding(inputs['title']),
    ], axis = 1)`

`class ItemlensModel(tfrs.models.Model):

def __init__(self,):
    super().__init__()
    # 查询模型
    # self.query_model =  UserModel()
    self.query_model = tf.keras.Sequential([
        UserModel(),
        tf.keras.layers.Dense(32)
    ],name = 'query_name')

    # 候选者模型
    # self.candidate_model = ItemModel()
    self.candidate_model = tf.keras.Sequential([
        ItemModel(),
        tf.keras.layers.Dense(32)
    ])
    
    # 任务
    self.task = tfrs.tasks.Retrieval(
        metrics = tfrs.metrics.FactorizedTopK(
            # candidates = items.batch(128).map(self.candidate_model),
            candidates = items.batch(128).map(self.candidate_model),
        )
    )
    
# 计算损失函数
def compute_loss(self, features, training = False):
    query_embeddings = self.query_model({
        'user_id': features['user_id'],
        'bhv_time': features['bhv_time'],
        'bhv_value': features['bhv_value'],
    })

    item_embeddings = self.candidate_model({
        'item_id': features['item_id'],
        'title': features['title'],
    })

    return self.task(query_embeddings, item_embeddings)`

There is no problem with the user model part, and the following error occurs in the item model part: image

Jobo-RS avatar Aug 19 '21 06:08 Jobo-RS

I also ran into this issue, and managed to fix it by making items a dict like:

items = items.map(lambda x: {
    "item_id": x['item_id'],
    "item_title": x['item_title'],
}).cache()

markharding avatar Aug 26 '21 13:08 markharding

I've solved this problem,I have solved this problem. The main problem lies in the early data format. The commodity data needs to be converted into a dictionary first, and then into a dataset (if your data format is dataframe)

data_item = pd.read_csv('../data/item_title_new.csv', nrows = 10000, encoding = 'utf-8')
# 商品属性特征,  数据格式转换,DF->dataset
items = tf.data.Dataset.from_tensor_slices(dict(data_item))
titles = tf.data.Dataset.from_tensor_slices((data_item['title']))

@markharding

Jobo-RS avatar Sep 02 '21 03:09 Jobo-RS

I've solved this problem,I have solved this problem. The main problem lies in the early data format. The commodity data needs to be converted into a dictionary first, and then into a dataset (if your data format is dataframe)

data_item = pd.read_csv('../data/item_title_new.csv', nrows = 10000, encoding = 'utf-8')
# 商品属性特征,  数据格式转换,DF->dataset
items = tf.data.Dataset.from_tensor_slices(dict(data_item))
titles = tf.data.Dataset.from_tensor_slices((data_item['title']))

@markharding

How did you solve? I have same problem using usermodel, I try to add location feature and get error

deeplearningnrs avatar Oct 09 '21 21:10 deeplearningnrs

@deeplearningnrs 方便指导一下吗,

我在item侧增加了特征以后,evaluate的矩阵就完全不起作用了,你遇到过类似的情况嘛

siyu1992 avatar Jun 15 '22 03:06 siyu1992

yes it didnt work either

shainaraza avatar Jun 15 '22 11:06 shainaraza

I also ran into this issue, and managed to fix it by making items a dict like:

items = items.map(lambda x: {
    "item_id": x['item_id'],
    "item_title": x['item_title'],
}).cache()

yes, i solved it that way too. items needs to have same variables that item tower.

hugoferrero avatar Jun 29 '22 17:06 hugoferrero

Hi @hugoferrero I tried your suggestion but didn't worked for me, may be I understanding is wrong. Can you please elaborate your suggestion and give more explain with code.

rohitverma92outlook avatar Aug 17 '23 05:08 rohitverma92outlook