Merlin
Merlin copied to clipboard
Unable to export graph as ensemble when using multi-hot categorical column
I am using inference container 22.05. When materializing feast in cell 6 deploying multi-stage recsys notebook, I am getting exception while materializing user_features. Item features worked just fine. Please download the sample data and notebooks used for this from the link here
Using tensorflow-inference 22.05 container. Attaching the error here
What Feast version are you using?
Hi Karl, I am using Feast==0.19.4. I tried removing the multi-hot categorical feature column(for example: search_term, one of the examples being ['tokkio','merlin','riva documentation','nvbot implementation']) and I was able to materialize it without that. @rnyak has informed that your team hasn't been able to test the multi-hot categorical feature columns in the dataset with Feast and that is the reason why it's happening. I would like to not remove this feature column from my dataset as it improves the model performance to some extent. Could you or someone from the team please work on testing it out?
Closing this since materialization issue was solved. And another bug was reported here https://github.com/NVIDIA-Merlin/systems/issues/115
@karlhigley and @rnyak
I tested it out with my correct data today. I am attaching all screenshots for visibility. With the list of strings feature(for ex. ['sipl_camera', 'build qnx', 'yuv', 'dl g']), I am unable to materialize the user features using feast. The error that I get is ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). Also, attaching the user features sample and output of train_tt.to_ddf().compute() here along with the snapshot of how the raw data looks like
So going back to our conversation in the past, I used INT32_LIST
dtype for this list type column instead of INT32
and I was able to materialize as well as use queryfeast operator in the successive steps. However it leads to another error now on the step where we create retrieval object using PredictTensorflow object.
What does model._saved_model_inputs_spec
look like for your retrieval model?
@karlhigley here is the output for the retrieval model
{'search': (TensorSpec(shape=(727237, 1), dtype=tf.int32, name='search_1'), TensorSpec(shape=(16384, 1), dtype=tf.int32, name='search_2')), 'userid': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='userid'), 'contentid': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='contentid'), 'eventtype': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='eventtype'), 'eventStrength': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='eventStrength'), 'ts': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='ts'), 'spaceid': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='spaceid'), 'spacekey': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='spacekey'), 'spacename': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='spacename'), 'parenttitle': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='parenttitle'), 'parentid': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='parentid'), 'mgrntaccount': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='mgrntaccount'), 'businessarea': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='businessarea'), 'jobposition': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='jobposition'), 'hrrolename': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='hrrolename'), 'location': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='location')}
I think the issue is happening while trying to get the shape
of the search
field, which looks like it might be in values
/offsets
format to handle variable length lists per example in the batch?
@mkumari-ed Any chance you could share the model definition code and/or TF saved model file you're using to help me replicate the issue? (Slack would be fine if that works better.)
@karlhigley Hi Karl, I just reached out to you on slack with file links and follow up questions. Please let me know.
Currently getting error at the step where we export graph as ensemble on search column.
@karlhigley Hi Karl, renaming the column from search
to search_1
didn't help in resolving the error that I get currently while exporting the graph as ensemble. It now complains of the missing search_1
column in operatorSubsetColumns.
Here is the screenshot
@karlhigley , the last comment shows that the issue is not resolved. But status is done. I have moved it to WIP. Please review
This is no longer the same (original) issue, which is why it was marked as done.
@mkumari-ed , the original issue is done. But we cannot close this because other issues have been added to the thread. Please create separate tickets and let us know and we. will close this ticket. Thank you.
Closing. @mkumari-ed please create additional issues if you're still experiencing this.