erlexec feat: combine embeddings

feat: combine embeddings

Open JohannesMessner opened this issue 3 years ago • 3 comments

Right now it is painful to combine embeddings from different nesting levels and set them at the top level, especially until https://github.com/jina-ai/docarray/issues/461 is solved.

So this should do the following: It takes an access path (or list of access paths?) (e.g. @.[image, main_text]), and combines the embeddings in those docs using one of 'sum', 'mean', 'concat', or a provided model.

It should handle the following cases:

[x] numpy embedding / predefined combiner
- [x] test
[x] torch embedding / predefined combiner
- [x] test
[ ] tf embedding / predefined combiner
- [ ] test
[ ] paddle embedding / predefined combiner
- [ ] test

[x] torch embedding model
- [ ] test
[ ] tf embedding model
- [ ] test
[ ] paddle embedding model
- [ ] test
[ ] onnx embedding model
- [ ] test

[x] numpy embedding / callable
- [ ] test
[x] torch embedding / callable
- [ ] test
[x] tf embedding / callable
- [ ] test
[x] paddle embedding / callable
- [ ] test

Other ToDos:

[ ] docs
[ ] rename the method: fuse_embeddings()?
[ ] refactor. Currently the code is quite ugly
[ ] refactor examples in the docs

Possible follow-up PRs:

[ ] enable a flag uniform_nesting which tells us that every doc in the da has the same number of relevant chunks. This would allow us to vectorize the combine operation
[ ] implement to_numpy
[ ] implement flag to discard chunk embeddings after root embedding has been set

Closes #512

Sep 08 '22 07:09 JohannesMessner

erlexec erlexec copied to clipboard

feat: combine embeddings

erlexec
erlexec copied to clipboard