Add a task that creates common association from a group of chunks
Chunk Associations Task
NOTE: This issue is part of Contribute-to-Win. Please comment first to get assigned. Read the details here
A task for creating semantic associations between document chunks and organizing them into common node sets based on their similarity.
Overview
The Chunk Associations Task analyzes document chunks for semantic similarity and creates weighted association edges between them. It creates weighted edgees between highly associated chunks with value Association and uses LLM classifier that has a prompt behind that should have an output saying if we should create an association or not.
Example
Book 1: Chunk talks about river dolphins Book 2: Chapter about river dolphins
Book 1 - Chunk 1 -> Association weight to Chapter about river dolphin
Input
The task should receive a batch of N chunks retrieved from an existing graph
Output
Datapoints with weighted edges that can be stored in the graph. They can belong to special Associaton nodeset or we can update and delete existing data.
I would like to solve this issue. Please assign it to me.
thanks for your interest @Sumeet2005 , the issue is assigned to you!
hey @Sumeet2005 , how is the progress? Do you have a question? As this issue is a part of the challenge, we want to have quick iterations :) please update us! the issue will be un-assigned if no PR is opened in the next 24 hrs
hi @Sumeet2005 , sorry to inform you but since we haven't heard back from you, we'll make this issue available for other contributors to pick up.
If you submit your PR before another contributor requests to contribute, you may be reassigned to the issue and we can review your PR.
Hi, I'd love to work on this! Can I get some more context on the task, like the project structure and where I can integrate the logic? Thanks!
Hi, is this task completed, or can I be assigned to this task?
hey @AniLeo-01, the issue is assigned to you, thanks for picking it up :)
This task can be implemented as part of memify. We look forward to your PR!
Thanks @hande-k!
hey @AniLeo-01, wanted to check if you have any questions :) please let us know!
This is actually a good idea. Would it work with the current nodeset implementation? As far as I remember nodeset has some flaws, for example with deduplication, as for example a person who is already in one nodeset gets deduplicated instead of belonging is more nodesets (from top of my mind, was looking into using nodesets the other day)
I wanna try it, could you assign it to me? @hande-k @Vasilije1990
hey @EricXiao95 good to see you :) done!
@EricXiao95 @hande-k @Vasilije1990 Hello! I am curious if this is being worked on? Otherwise, I would like to get assigned! Thank you.
hi @kckoh thanks for your interest! let's give @EricXiao95 a day to see it then feel free to start cracking :)
@hande-k Would you mind assigning this to me? I’d love to get started on it.
Hi @hande-k @kckoh, sorry for the late reply! I missed the notification. I am actually working on this. I'll have a PR ready by the end of this week. Thanks for checking in!
Hi @EricXiao95 @hande-k! No worries about the delay. I actually started working on this since yesterday and I'm nearly finished . @EricXiao95 - would you be open to collaborating? Or we could compare approaches when we're both done? Happy to coordinate to avoid duplicate effort. @hande-k - what do you think is the best path forward here?
@Vasilije1990 @EricXiao95 @hande-k I've created a PR. Could you please take a look and review?
Hi @kckoh, thanks for the update. Since the PR is already up, please go ahead with yours. Thanks @hande-k!
Thanks both @EricXiao95 @kckoh! the pr will be reviewed soon :)
Thanks both @EricXiao95 @kckoh! the pr will be reviewed soon :)
Looking forward to contribute more on cognee :)