Port sentence embedding example to keras-core
Port sentence_embeddings_with_sbert example to keras_core and change the title to be semantic similarity and text clustering using S-RoBERTa with keras_nlp
Hi @abuelnasr0 are you willing to port it to keras-core ?
Hi @shivance Yes I would love to. I will open a pull request in keras-io as soon as possible.
Thanks @abuelnasr0 !
Hi @mattdangerw, I need a help in something. I have ported it to keras-core. but there is an error with the tensorflow backend with the triplete objective function example and it worked fine with torch and jax backends. the error occurs when I try to fit the model. here is a colab link : https://colab.research.google.com/gist/abuelnasr0/8aef29478ad1b3204f1c7e2b52af5451/copy-of-sentence_embeddings_with_sbert.ipynb you can jump to the triplete objective function section after running the setup.
the error :
[/usr/local/lib/python3.10/dist-packages/keras_core/src/layers/input_spec.py](https://localhost:8080/#) in assert_input_compatibility(input_spec, inputs, layer_name)
179 continue
180
--> 181 shape = backend.standardize_shape(x.shape)
182 ndim = len(shape)
183 # Check ndim.
[/usr/local/lib/python3.10/dist-packages/keras_core/src/backend/common/variables.py](https://localhost:8080/#) in standardize_shape(shape, allow_dynamic_batch_size, allow_all_dynamic)
412 if not hasattr(shape, "__iter__"):
413 raise ValueError(f"Cannot convert '{shape}' to a shape.")
--> 414 shape = tuple(shape)
415
416 for i, e in enumerate(shape):
[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/tensor_shape.py](https://localhost:8080/#) in __iter__(self)
929 """Returns `self.dims` if the rank is known, otherwise raises ValueError."""
930 if self._dims is None:
--> 931 raise ValueError("Cannot iterate over a shape with unknown rank.")
932 else:
933 if self._v2_behavior:
ValueError: Cannot iterate over a shape with unknown rank.
I will give some insights of that error. This error occurs when trying to assert input compatibility between input_spec and the input of the model. And when trying to get the shape of the first input which is padding mask of the anchor sentence, the above error occured.
I have played with the code of keras core to get some info about the error and here what I got:
input[0]: Tensor("data_2:0", dtype=bool)
input[0].shape:
you can find that code here: https://colab.research.google.com/gist/abuelnasr0/cfd681e3d87a99f357b76965ec0bcb98/sentence_embeddings_with_sbert.ipynb
@mattdangerw It works now. I have changed the way I am loading and preprocessing the data and it surprisingly worked :D. I didn't even mean to fix it. I was trying something. I will edit the text and open a pull request soon.