neo4j-graphrag-python icon indicating copy to clipboard operation
neo4j-graphrag-python copied to clipboard

Ollama example needed, please~ ☕️

Open adamwuyu opened this issue 11 months ago • 3 comments

I really like the flexibility and high integration of this project, so I have been trying to integrate with ollama today. Now it is 00:30 here and I have not succeeded yet. Can you help me?

Question 1: This error will appear several times during the parsing process:

LLM response has improper format {'nodes': [{'name': 'Configuring the Prompt', 'type': 'document_section', 'content': '
...
chunk_index=2

If this error occurs, the corresponding chunk nodes and relationships will not be generated, right?

Question 2: Why does parsing a little more content result in a 500 error, such as a 50-page PDF?

Question 3: Why sometimes this error happens:

  File "/Users/adam/miniconda3/envs/lightrag/lib/python3.11/site-packages/neo4j/_sync/io/_common.py", line 254, in on_failure
    raise self._hydrate_error(metadata)
neo4j.exceptions.CypherTypeError: {code: Neo.ClientError.Statement.TypeError} {message: Property values can only be of primitive types or arrays thereof. Encountered: Map{title -> String("${movieTitle}"), score -> String("${score}")}.}
run_id='7ec2d19e-4c70-47c8-93d9-5b978228c242' result={'resolver': {'number_of_nodes_to_resolve': 0, 'number_of_created_nodes': None}}

Question 4: In the latest version, the best practice for using the local ollama model is to use

from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.llm.openai_llm import OpenAILLM

or:

from neo4j_graphrag.embeddings.ollama import OllamaEmbeddings
from neo4j_graphrag.llm.ollama_llm import OllamaLLM

Using the former can start parsing, but it will cause problems 1, 2 and 3 mentioned above; if the latter is used, parsing cannot be started, and an error is prompted:

File "/Users/adam/miniconda3/envs/lightrag/lib/python3.11/site-packages/pydantic/main.py", line 214, in init
validated_self = self.pydantic_validator.validate_python(data, self_instance=self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.ValidationError: 1 validation error for Neo4jNode embedding_properties.embedding.0 Input should be a valid number [type=float_type, input_value=[0.0035258962, 0.00050194...047494516, -0.006978964], input_type=list] For further information visit https://errors.pydantic.dev/2.10/v/float_type

adamwuyu avatar Jan 07 '25 16:01 adamwuyu

This is my full source code, hope it explain everything. Or plese provide a working ollama example?

import asyncio
from pathlib import Path

import neo4j
from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.llm.openai_llm import OpenAILLM
# from neo4j_graphrag.embeddings.ollama import OllamaEmbeddings
# from neo4j_graphrag.llm.ollama_llm import OllamaLLM
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.experimental.pipeline.pipeline import PipelineResult
from neo4j_graphrag.llm import LLMInterface

# Neo4j db infos
NEO4J_URI = "neo4j://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "neo4j"
OLLAMA_URL = "http://127.0.0.1:11434/v1"
LLM_MODEL = "qwen2.5:14b"
EMBEDDING_MODEL = "nomic-embed-text:latest"

# Neo4j db infos
URI = NEO4J_URI
AUTH = (NEO4J_USERNAME, NEO4J_PASSWORD)
DATABASE = "neo4j"
# 获取当前文件的父目录
root_dir = Path().resolve().parents[1]
print(root_dir)
file_path = root_dir / "data" / "neo4j-graphrag-python pages 1-36.pdf"
print(file_path)

embedder = OpenAIEmbeddings(
    base_url=OLLAMA_URL, 
    api_key="None", 
    model=EMBEDDING_MODEL
)

llm = OpenAILLM(
    base_url=OLLAMA_URL, 
    api_key="None", 
    model_name=LLM_MODEL, 
        model_params={
            "max_tokens": 2000,
            "response_format": {"type": "json_object"},
        })

# Connect to the Neo4j database
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD), connection_timeout=200, max_connection_pool_size=50)


# Instantiate Entity and Relation objects. This defines the
# entities and relations the LLM will be looking for in the text.
NODES_FROM_PDF = [
    'Chunk', 'Class', 'Component', 'Document', 'Person', 'Package', 'Organization', 
    'Parameter', 'Node', 'Date', 'Configuration', 'House', 'PythonClient', 'Method', 
    'Argument', 'Property', 'Entity', 'Concept', 'Interface', 'Function', 'System', 
    'Submodule', 'LLM', 'Service', 'Authentification', 'DriverInstance', 'FilePath', 
    'KGBuilder', 'Planet', 'EntityAndRelationExtractor', 'SimpleKGBuilder', 
    'Attribute', 'PythonPackage', 'Variable', 'Algorithm', 
    'Driver', 'Retriever', 'Schema', 'Project', 'Version', 'Field', 
    'SchemaDefinition', 'API Key', 'URI', 'AUTH', 'INDEX_NAME', 'Label', 
    'Neo4jConfig', 'LLMConfig', 'EmbedderConfig'
]
# Prompt: 你是neo4j和neo4j-graphrag的开发者,如果要把这两种知识整理成知识图谱,
# 请补充以下nodes之外的nodes,并单独以一个python list的形式输出。
NODES_FROM_CHATGPT = [
    'Graph', 'Relationship', 'Query', 'Cypher', 'Index', 'Transaction', 
    'GraphAlgorithm', 'DataModel', 'GraphVisualization', 'DataIngestion', 
    'ETL', 'API', 'Driver', 'Deployment', 'Scalability', 'Performance', 
    'Security', 'Backup', 'Restore', 'Monitoring', 'Analytics', 
    'MachineLearning', 'Recommendation', 'DataScience', 'GraphTheory', 
]
ENTITIES = NODES_FROM_PDF + NODES_FROM_CHATGPT

RELATIONS = ["INHERITED_FROM", "REQUIRE", "HAS", "IS_INSTANCE_OF", "USES", "BELONGS_TO", "CONNECTED_TO", "PROVIDES", "DEPENDS_ON", "CONTAINS"]
POTENTIAL_SCHEMA = [
    ("Class", "HAS", "Property"),
    ("Person", "BELONGS_TO", "Organization"),
    ("API", "REQUIRE", "API Key"),
    ("Document", "HAS", "Component"),
    ("Chunk", "CONTAINS", "Node"),
    ("Method", "IS_INSTANCE_OF", "Function"),
    ("Package", "HAS", "Submodule"),
    ("Configuration", "REQUIRE", "Schema"),
    ("Driver", "USES", "API"),
    ("Graph", "HAS", "Node"),
    ("Entity", "HAS", "Attribute"),
    ("Project", "BELONGS_TO", "Organization"),
    ("GraphAlgorithm", "DEPENDS_ON", "DataModel"),
]

async def define_and_run_pipeline(
    neo4j_driver: neo4j.Driver,
    llm: LLMInterface,
    embedder=OpenAIEmbeddings,
) -> PipelineResult:
    kg_builder = SimpleKGPipeline(
        llm=llm,
        driver=neo4j_driver,
        embedder=embedder,
        entities=ENTITIES,
        relations=RELATIONS,
        potential_schema=POTENTIAL_SCHEMA,
        neo4j_database=DATABASE,
    )
    return await kg_builder.run_async(file_path=str(file_path))


async def main() -> PipelineResult:
    with neo4j.GraphDatabase.driver(URI, auth=AUTH) as driver:
        res = await define_and_run_pipeline(driver, llm, embedder)
    await llm.async_client.close()
    return res


if __name__ == "__main__":
    res = asyncio.run(main())
    print(res)

adamwuyu avatar Jan 07 '25 16:01 adamwuyu

Hi @adamwuyu ,

Thanks you for reaching out and for sharing your code! Trying to answer your questions below:

  1. Yes, sometimes the LLM can't produce valid JSON or the JSON in the format the library expects. In that case, only the 'Chunk' node will be created (in the "lexical graph"), but no entities for this chunk will be created.

  2. I can't say a lot about that, we've not experienced such errors. Two ideas:

    • check if there is somethig in the Ollama configuration that can help
    • also maybe try with a higher chunk sizes to reduce the number of calls
  3. Thanks for reporting this one, we will investigate.

  4. Both methods should work the same. The error you're seeing in the second case is likely a bug, we will also take a look.

stellasia avatar Jan 08 '25 09:01 stellasia

Hi,

Regarding the error you reported with Ollama classes, it was a bug that has been fixed in version 1.4.2, released just now.

stellasia avatar Jan 15 '25 09:01 stellasia

Tried the provided code in the newest version and it didn't work with a longer (66 pages) PDF, stating the LLM response was in the wrong format. Using just one page i got it to work. Using the OpenAILLM class instead of the OllamaLLM class also fixed/circumvented #376. I didn't know that it was possible to use the OpenAI classes with Ollama endpoints prior to finding this issue (probably a me problem), but that could also be mentioned somewhere where a person trying to get it running with Ollama would look.

Powerkrieger avatar Aug 19 '25 16:08 Powerkrieger