GraphRAG icon indicating copy to clipboard operation
GraphRAG copied to clipboard

Can llama3:8b extract entity and relationship from chunk as required?

Open JWzhong1 opened this issue 8 months ago • 2 comments

Failed to extract entities and relationships from chunk during call to GraphRAG method to build KG when using llama3:8b by Ollama

 async def _extract_records_from_chunk(self, chunk_info: TextChunk):
        context = self._build_context_for_entity_extraction(chunk_info.content)
        prompt_template = GraphPrompt.ENTITY_EXTRACTION_KEYWORD if self.config.enable_edge_keywords else GraphPrompt.ENTITY_EXTRACTION
        prompt = prompt_template.format(**context)
        working_memory = Memory()
        working_memory.add(Message(content=prompt, role="user"))
        final_result = await self.llm.aask(prompt)
        working_memory.add(Message(content=final_result, role="assistant"))

According to the prompt, the output format should be:

("entity"<|><entity_name><|><entity_type><|><entity_description>)
("relationship"<|><source_entity><|><target_entity><|><relationship_description><|><relationship_strength>)

The final_result I get during the run does not extract useful entities and relationships, as follows:

"It appears that you have provided a text about honey bees, their life cycle, and their different castes (queen, worker, and drone). However, I don't see any specific question or task related to this text.\n\nIf you could provide more context or clarify what you would like me to do with this text, I'll be happy to assist you. For example, you might want me to:\n\n* Summarize the main points of the text\n* Identify key concepts and terminology related to honey bees\n* Extract specific information about a particular caste (e.g., queen, worker, drone)\n* Analyze or discuss the implications of certain aspects of bee biology\n\nPlease let me know how I can help!Here are the additional entities that were missed:\n\n**The Brood**\n\n* **Brood Development**: The process by which a honey bee's life starts as an egg and proceeds through the larval and pupal stages of development before emergence as an adult.\n* **Brood Food**: A secretion from glands in the heads of nurse bees, mixed with honey and pollen, that is fed to developing worker bees.\n\n**The House Bee**\n\n* **House Bee**: The second segment of a worker bee's life, during which she performs tasks within the hive, such as caring for young and building comb.\n* **Nurse Bees**: Worker bees responsible for feeding and caring for developing brood.\n\n**The Field Bee**\n\n* **Field Bee**: The final segment of a worker bee's life, during which she leaves the hive to gather nectar, pollen, and water from outside sources.\n\n**Other Entities**\n\n* **Spermatheca**: An organ in the queen bee's body that contains sperm received from multiple matings earlier in her adult life.\n* **Pollen Baskets**: Specialized structures on a worker bee's legs used to collect and transport pollen.\n* **Wax Glands**: Organs in a worker bee's abdomen responsible for producing wax, which is used to build comb.\n\nThese entities were not extracted in the previous output, but are relevant to the text about honey bees."

The number of nodes and edges extracted is empty, and other model provided by ollama, such as qwen2.5:14b, have similar results:

Image

When using the API service provided by glm-4-plus, it can be correctly extracted, therefore, I want to determine whether a small model such as llama3:8b or qwen 2.5:14b can complete the task of graph construction:

final_result = '("entity"<|>"North America"<|>"geo"<|>"North America is the geographical region discussed in relation to the population cycle of bee colonies.")##\n("entity"<|>"Colony"<|>"organization"<|>"Colony refers to a group of bees living together, with a specific structure and lifecycle.")##\n("entity"<|>"Winter"<|>"event"<|>"Winter is a seasonal event that significantly impacts the population and activity levels of bee colonies.")##\n("entity"<|>"Summer Solstice"<|>"event"<|>"Summer Solstice is a key event marking the peak of bee population and activity levels.")##\n("entity"<|>"Swarming"<|>"event"<|>"Swarming is a natural event in the lifecycle of bee colonies, involving the division and reproduction of the colony.")##\n("relationship"<|>"Colony"<|>"North America"<|>"The population cycle of the Colony is discussed in the context of North America, indicating a geographical relationship."<|>7)##\n("relationship"<|>"Colony"<|>"Winter"<|>"The Colony\'s population and activity levels are significantly affected by the Winter season."<|>8)##\n("relationship"<|>"Colony"<|>"Summer Solstice"<|>"The Colony reaches its peak population and activity levels around the time of the Summer Solstice."<|>9)##\n("relationship"<|>"Colony"<|>"Swarming"<|>"Swarming is a critical event in the lifecycle of the Colony, affecting its structure and reproduction."<|>10)<|COMPLETE|>'

config2.yaml as followed:

llm:
  api_type: "open_llm" # or openai
  base_url: 'http://localhost:11434/v1' # or forward url / other llm url
  model: "llama3.1:8b"
  api_key: "abcdefghijklmnopqrstuvwxyz"

embedding:
  api_type: "ollama"  # or  ollama / openai.
  base_url: "http://localhost:11434/api/embed"  # or forward url / other llm url
  api_key: "abfffsfedsfgs"
  model: "bge-m3:latest"
  cache_dir: ""
  dimensions: 1024
  max_token_size: 8102
  embed_batch_size: 128
  embedding_func_max_async: 16
 
data_root:  "./Data/datasets" # Root directory for data
working_dir: './Result' # Result directory for the experiment
exp_name: 'agriculture' # Experiment name
# 

JWzhong1 avatar Apr 24 '25 04:04 JWzhong1

I meet the same question,too .I try various small model,but they don't always extract entities and relationships and even make errors.I assume whether the small model used in the paper is fine-tuned.Of course,that is my assumption.

hsbdkdn avatar Apr 24 '25 09:04 hsbdkdn

Indeed, using the small models may result in a format error or an extraction error. We can retry the LLM calls. In addition, in my paper, we do not fine-tune the model

JayLZhou avatar May 08 '25 08:05 JayLZhou