Merlin icon indicating copy to clipboard operation
Merlin copied to clipboard

Unable to export graph as ensemble when using multi-hot categorical column

Open mkumari-ed opened this issue 2 years ago • 15 comments

I am using inference container 22.05. When materializing feast in cell 6 deploying multi-stage recsys notebook, I am getting exception while materializing user_features. Item features worked just fine. Please download the sample data and notebooks used for this from the link here

Using tensorflow-inference 22.05 container. Attaching the error here Screen Shot 2022-06-02 at 6 47 45 AM

mkumari-ed avatar Jun 02 '22 16:06 mkumari-ed

What Feast version are you using?

karlhigley avatar Jun 06 '22 20:06 karlhigley

Hi Karl, I am using Feast==0.19.4. I tried removing the multi-hot categorical feature column(for example: search_term, one of the examples being ['tokkio','merlin','riva documentation','nvbot implementation']) and I was able to materialize it without that. @rnyak has informed that your team hasn't been able to test the multi-hot categorical feature columns in the dataset with Feast and that is the reason why it's happening. I would like to not remove this feature column from my dataset as it improves the model performance to some extent. Could you or someone from the team please work on testing it out?

mkumari-ed avatar Jun 07 '22 15:06 mkumari-ed

Closing this since materialization issue was solved. And another bug was reported here https://github.com/NVIDIA-Merlin/systems/issues/115

rnyak avatar Jun 15 '22 14:06 rnyak

@karlhigley and @rnyak I tested it out with my correct data today. I am attaching all screenshots for visibility. With the list of strings feature(for ex. ['sipl_camera', 'build qnx', 'yuv', 'dl g']), I am unable to materialize the user features using feast. The error that I get is ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). Also, attaching the user features sample and output of train_tt.to_ddf().compute() here along with the snapshot of how the raw data looks like materialize user features error raw data sample train tt sample user features sample

mkumari-ed avatar Jul 21 '22 16:07 mkumari-ed

So going back to our conversation in the past, I used INT32_LIST dtype for this list type column instead of INT32 and I was able to materialize as well as use queryfeast operator in the successive steps. However it leads to another error now on the step where we create retrieval object using PredictTensorflow object. Screen Shot 2022-07-21 at 1 08 36 PM Screen Shot 2022-07-21 at 1 08 55 PM

mkumari-ed avatar Jul 21 '22 20:07 mkumari-ed

What does model._saved_model_inputs_spec look like for your retrieval model?

karlhigley avatar Jul 27 '22 15:07 karlhigley

@karlhigley here is the output for the retrieval model {'search': (TensorSpec(shape=(727237, 1), dtype=tf.int32, name='search_1'), TensorSpec(shape=(16384, 1), dtype=tf.int32, name='search_2')), 'userid': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='userid'), 'contentid': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='contentid'), 'eventtype': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='eventtype'), 'eventStrength': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='eventStrength'), 'ts': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='ts'), 'spaceid': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='spaceid'), 'spacekey': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='spacekey'), 'spacename': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='spacename'), 'parenttitle': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='parenttitle'), 'parentid': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='parentid'), 'mgrntaccount': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='mgrntaccount'), 'businessarea': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='businessarea'), 'jobposition': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='jobposition'), 'hrrolename': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='hrrolename'), 'location': TensorSpec(shape=(16384, 1), dtype=tf.int32, name='location')}

mkumari-ed avatar Jul 28 '22 20:07 mkumari-ed

I think the issue is happening while trying to get the shape of the search field, which looks like it might be in values/offsets format to handle variable length lists per example in the batch?

karlhigley avatar Jul 29 '22 13:07 karlhigley

@mkumari-ed Any chance you could share the model definition code and/or TF saved model file you're using to help me replicate the issue? (Slack would be fine if that works better.)

karlhigley avatar Jul 29 '22 14:07 karlhigley

@karlhigley Hi Karl, I just reached out to you on slack with file links and follow up questions. Please let me know.

mkumari-ed avatar Jul 29 '22 16:07 mkumari-ed

Currently getting error at the step where we export graph as ensemble on search column. Screen Shot 2022-08-01 at 3 38 48 PM Screen Shot 2022-08-01 at 3 39 11 PM

mkumari-ed avatar Aug 02 '22 20:08 mkumari-ed

@karlhigley Hi Karl, renaming the column from search to search_1 didn't help in resolving the error that I get currently while exporting the graph as ensemble. It now complains of the missing search_1 column in operatorSubsetColumns. Screen Shot 2022-08-03 at 9 21 58 AM

Here is the screenshot

mkumari-ed avatar Aug 03 '22 16:08 mkumari-ed

@karlhigley , the last comment shows that the issue is not resolved. But status is done. I have moved it to WIP. Please review

viswa-nvidia avatar Aug 04 '22 18:08 viswa-nvidia

This is no longer the same (original) issue, which is why it was marked as done.

karlhigley avatar Aug 04 '22 22:08 karlhigley

@mkumari-ed , the original issue is done. But we cannot close this because other issues have been added to the thread. Please create separate tickets and let us know and we. will close this ticket. Thank you.

viswa-nvidia avatar Aug 15 '22 16:08 viswa-nvidia

Closing. @mkumari-ed please create additional issues if you're still experiencing this.

EvenOldridge avatar Oct 17 '22 23:10 EvenOldridge