private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Error when running make ingest.

Open sati-bodhi opened this issue 6 months ago • 2 comments

Not sure where the discrepancies in array shape are coming from.

make ingest ~/Documents/privategpt_ingest -- --watch --log-file ~/Documents/privategpt_ingest.log
12:46:33.280 [INFO    ] private_gpt.settings.settings_loader - Starting application with profiles=['default', 'mock']
12:46:35.789 [INFO    ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=mock
12:46:36.450 [INFO    ] private_gpt.components.embedding.embedding_component - Initializing the embedding model in mode=mock
12:46:36.777 [INFO    ] llama_index.indices.loading - Loading all indices.
12:46:36.836 [INFO    ]                  __main__ - Ingesting files=['America’s War for the Greater Middle East A Military History (Andrew J. Bacevich) (Z-Library).epub', 'Gaza in Crisis Reflections on Israels War Against the Palestinians (Noam Chomsky and Ilan Pappé) (Z-Library).epub']
12:46:36.837 [INFO    ] private_gpt.server.ingest.ingest_service - Ingesting file_names=['America’s War for the Greater Middle East A Military History (Andrew J. Bacevich) (Z-Library).epub', 'Gaza in Crisis Reflections on Israels War Against the Palestinians (Noam Chomsky and Ilan Pappé) (Z-Library).epub']
12:46:36.837 [INFO    ] private_gpt.components.ingest.ingest_component - Ingesting file_name=America’s War for the Greater Middle East A Military History (Andrew J. Bacevich) (Z-Library).epub
12:46:36.838 [INFO    ] private_gpt.components.ingest.ingest_component - Ingesting file_name=Gaza in Crisis Reflections on Israels War Against the Palestinians (Noam Chomsky and Ilan Pappé) (Z-Library).epub
12:46:37.049 [INFO    ] private_gpt.components.ingest.ingest_component - Transformed file=Gaza in Crisis Reflections on Israels War Against the Palestinians (Noam Chomsky and Ilan Pappé) (Z-Library).epub into count=1 documents
Parsing nodes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.91it/s]
Generating embeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2877/2877 [00:00<00:00, 156292.26it/s]
12:46:37.481 [INFO    ] private_gpt.components.ingest.ingest_component - Inserting count=2877 nodes in the index
Generating embeddings: 0it [00:00, ?it/s]
12:46:37.544 [INFO    ] private_gpt.components.ingest.ingest_component - Transformed file=America’s War for the Greater Middle East A Military History (Andrew J. Bacevich) (Z-Library).epub into count=1 documents
Parsing nodes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.49it/s]
Generating embeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 9764/9764 [00:00<00:00, 117162.75it/s]
12:46:38.512 [INFO    ] private_gpt.components.ingest.ingest_component - Inserting count=9764 nodes in the index
Generating embeddings: 0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/home/sati/privateGPT/scripts/ingest_folder.py", line 102, in <module>
    worker.ingest_folder(root_path, args.ignored)
  File "/home/sati/privateGPT/scripts/ingest_folder.py", line 38, in ingest_folder
    self._ingest_all(self._files_under_root_folder)
  File "/home/sati/privateGPT/scripts/ingest_folder.py", line 42, in _ingest_all
    self.ingest_service.bulk_ingest([(str(p.name), p) for p in files_to_ingest])
  File "/home/sati/privateGPT/private_gpt/server/ingest/ingest_service.py", line 92, in bulk_ingest
    documents = self.ingest_component.bulk_ingest(files)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sati/privateGPT/private_gpt/components/ingest/ingest_component.py", line 272, in bulk_ingest
    self._ingest_work_pool.starmap(self.ingest, files)
  File "/home/sati/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/pool.py", line 375, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sati/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/home/sati/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/sati/.pyenv/versions/3.11.3/lib/python3.11/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sati/privateGPT/private_gpt/components/ingest/ingest_component.py", line 264, in ingest
    return self._save_docs(documents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sati/privateGPT/private_gpt/components/ingest/ingest_component.py", line 287, in _save_docs
    self._index.insert_nodes(nodes, show_progress=True)
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 267, in insert_nodes
    self._insert(nodes, **insert_kwargs)
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 258, in _insert
    self._add_nodes_to_index(self._index_struct, nodes, **insert_kwargs)
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 189, in _add_nodes_to_index
    new_ids = self._vector_store.add(nodes, **insert_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/llama_index/vector_stores/qdrant.py", line 127, in add
    self._client.upsert(
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/qdrant_client/qdrant_client.py", line 987, in upsert
    return self._client.upsert(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/qdrant_client/local/qdrant_local.py", line 421, in upsert
    collection.upsert(points)
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/qdrant_client/local/local_collection.py", line 1090, in upsert
    self._upsert_point(
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/qdrant_client/local/local_collection.py", line 1067, in _upsert_point
    self._add_point(point)
  File "/home/sati/.pyenv/versions/3.11.3/envs/privategpt/lib/python3.11/site-packages/qdrant_client/local/local_collection.py", line 1017, in _add_point
    named_vectors[idx] = vector_np
    ~~~~~~~~~~~~~^^^^^
ValueError: could not broadcast input array from shape (384,) into shape (1024,)
make: *** [Makefile:52: ingest] Error 1

sati-bodhi avatar Feb 19 '24 04:02 sati-bodhi