graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Feature Request]: human_readable_id for text_units

Open hyiip opened this issue 1 year ago • 1 comments

Do you need to file an issue?

  • [X] I have searched the existing issues and this feature is not already filed.
  • [X] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • [X] I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.

Is your feature request related to a problem? Please describe.

I noticed that in the \query\structured_search\local_search\system_prompt.py, there is a prompt to cite the reference:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]."

where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

However, if we examine create_final_text_units.parquet, there is no human_readable_id for us the to query like

text_units_df = pd.read_parquet("create_final_text_units.parquet")
text_units_df  = text_units_df ["human_readable_id" = source_id]

which is relatively unintuitive as it's clearly stated that the id (15,16) represent the id (not the index). Moreover, for entities, relationship, claims, there is human_readable_id field for querying. For reports, there is community field that can serve as human_readable_id.

Describe the solution you'd like

Add a human_readable_id for text_units for text_units dataframe.

Additional context

No response

hyiip avatar Aug 06 '24 10:08 hyiip

Agree that this seems confusing. The human-readable ID for text_units is assigned from the index when we load the data for query. Text units were likely added to the query context after the others and hadn't had an incremental ID assigned during indexing so we used the default. Will add to backlog so this is more clear.

natoverse avatar Aug 06 '24 18:08 natoverse