graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

More flexible embedding field config

Open natoverse opened this issue 1 year ago • 0 comments

Do you need to file an issue?

  • [X] I have searched the existing issues and this feature is not already filed.
  • [X] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • [X] I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.

Is your feature request related to a problem? Please describe.

We have two embedding targets: "required" and "all". Required is the default, and will only embed entity.description. If you want to customize what fields are embedded, you need to set the target to all and then use the skip parameter to list out the ones you don't want. This feels backward from an API point of view, but exists for legacy reasons. Several users have reported a desire to either add one or two fields into the embedding, or to skip all embeddings entirely because they aren't planning to use local search.

Some options:

  • add a "none" embedding target. This should skip all embeddings steps, and negate the need to add any LLM embedding config.
  • add a "selected" embedding target that is opt-in instead of the current skip opt-out. This would necessitate a new parameter along the lines of GRAPHRAG_EMBEDDING_INCLUDE.

It may be necessary to keep both skip and include to avoid breaking existing configs, but in general the opt-in language is easier to understand if you don't want to use the defaults.

Describe the solution you'd like

At the minimum, the target=none option would streamline config for a lot of people who don't need embeddings.

Additional context

No response

natoverse avatar Aug 05 '24 18:08 natoverse