[Question]: Multi tower model?
Hi @maciejkula
I have a simple model currently which essentially matches query to documents, my training data looks like this:
[
{'q': 'is AI taking over?', 'doc': 'AI is a very ....'},
...
]
What I would like to do is incorporate other features by which I could query. For example for a document I also may have features such as topic or entities.
I would like to train my model in such a way that these other features are also placed into the same embedding space, so that I can query into my docs not by way of free text but the topics for example.
One thing to mention is that I am using a BERT based language model on both the query and candidate tower.
Thanks in advance!
EDIT: do you think this is a good way to use TFRS?
Yes, this sounds like a very legitimate use case.
One possibility is to train a retrieval model where the training data looks like this:
[{"topic": "AI", "doc": "AI is a very...}]
You could-co trains this together with your original model, and this should result in queries and topics being embedded into the same space - one that is appropriate for querying your documents.
Thanks @maciejkula so just to clarify:
The training input would be something like:
[({"topic": "AI", "doc": "AI is a very...}, {"query": "is AI taking over?")]
So topic would be an additional feature to our X, or do suggest some thing different?
Yes, this would be the standard approach, where you feed all document features to one tower, and all query features to the other.
However, if you want to query by topic, you may find that you'll need to train on (topic, doc) pairs, with the topic effectively becoming your query.