kotaemon icon indicating copy to clipboard operation
kotaemon copied to clipboard

[BUG] LightRAG file selection

Open newbie-Li opened this issue 1 year ago • 2 comments

Description

select "Search All" or select any group in "Search in Files", empty file_ids will be sent to retriever pipeline

libs\ktem\ktem\index\file\graph\graph_index.GraphRAGIndex -> get_retriever_pipelines

import:

from typing import Any
from sqlalchemy.orm import Session
import json

from ktem.index.file import FileIndex
from ktem.db.models import engine

from ..base import BaseFileIndexIndexing, BaseFileIndexRetriever
from .pipelines import GraphRAGIndexingPipeline, GraphRAGRetrieverPipeline

replace is_all, sel_ids, _ = selected with:

is_all, sel_ids, _ = selected
        if is_all == "all":
            Index = self._resources.get("Index")
            with Session(engine) as session:
                all_id = session.query(Index.source_id).filter(Index.relation_type == "graph").all()
                file_ids = [i[0] for i in all_id]
        else:
            file_ids = []
            for item in sel_ids:
                if item.startswith("["):
                    group_file_ids = json.loads(item)
                    file_ids.extend(group_file_ids)
                else:
                    file_ids.append(item)

Reproduction steps

1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

Screenshots

![image](https://github.com/user-attachments/assets/907929ab-b59b-443d-a795-8896418f361a)

Logs

No response

Browsers

No response

OS

No response

Additional information

No response

newbie-Li avatar Dec 12 '24 02:12 newbie-Li

not only a simple selection bug each time upload files, will create a unique id, then create a folder by id, and store light rag data in this folder. when select multi files, pick the first file id, query the linked unique id, and the create light query.

when I have 10 PDF, upload 5 at first and then another 5. It seems that if i select all files, will only search in first 5 PDF

newbie-Li avatar Dec 14 '24 09:12 newbie-Li

I am seeing same thing, it seems like LightRag will search against ALL documents that are in the same indexes no matter what you selected and will only search against documents in that same index even if you select files from multiple indexes. I ran different test scenarios with two markup resume files

  1. John_Doe_Resume.txt
  2. Jane_Smith_Resume.txt

Drag & Drop each file separately to LightRag file collection

  1. Two separate indexes are created under ktem_app_data\user_data\files\lightrag\
  2. Query will only search against one index at a time even if both files are selected. And "Search All" button doesn't work. It will complaint no documents are selected.
  3. When selecting multiple files in the drop down, only the first one selected will be included in the search. The others will be ignored.

Drag & Drop both files at the same time to be indexed

  1. One index is created under ktem_app_data\user_data\files\lightrag\
  2. Query will search both documents
  3. Query will search both documents EVEN if I only selected one file in the drop down.

I believe LightRag can do incremental index, i.e. adding new documents will update the existing index instead of creating new one. Will be great if that can be implemented.

eddprogrammer avatar Dec 15 '24 16:12 eddprogrammer