langchain
langchain copied to clipboard
deeplake.util.exceptions.TransformError
I'm attempting to load some Documents and get a TransformError
- could someone please point me in the right direction? Thanks!
I'm afraid the traceback doesn't mean much to me.
db = DeepLake(dataset_path=deeplake_path, embedding_function=embeddings)
db.add_documents(texts)
tensor htype shape dtype compression
------- ------- ------- ------- -------
embedding generic (0,) float32 None
ids text (0,) str None
metadata json (0,) str None
text text (0,) str None
Evaluating ingest: 0%| | 0/1 [00:10<?
Traceback (most recent call last):
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 1065, in extend
self._extend(samples, progressbar, pg_callback=pg_callback)
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 1001, in _extend
self._samples_to_chunks(
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 824, in _samples_to_chunks
num_samples_added = current_chunk.extend_if_has_space(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\chunk_compressed_chunk.py", line 50, in extend_if_has_space
return self.extend_if_has_space_byte_compression(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\chunk_compressed_chunk.py", line 233, in extend_if_has_space_byte_compression
serialized_sample, shape = self.serialize_sample(
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\base_chunk.py", line 342, in serialize_sample
incoming_sample, shape = serialize_text(
^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\serialize.py", line 505, in serialize_text
incoming_sample, shape = text_to_bytes(incoming_sample, dtype, htype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\serialize.py", line 458, in text_to_bytes
byts = json.dumps(sample, cls=HubJsonEncoder).encode()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 238, in dumps
**kw).encode(obj)
^^^^^^^^^^^
File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\encoder.py", line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\encoder.py", line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
ValueError: Circular reference detected
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\util\transform.py", line 220, in _transform_and_append_data_slice
transform_dataset.flush()
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\transform\transform_dataset.py", line 154, in flush
raise SampleAppendError(name) from e
deeplake.util.exceptions.SampleAppendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 1065, in extend
self._extend(samples, progressbar, pg_callback=pg_callback)
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 1001, in _extend
self._samples_to_chunks(
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 824, in _samples_to_chunks
num_samples_added = current_chunk.extend_if_has_space(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\chunk_compressed_chunk.py", line 50, in extend_if_has_space
return self.extend_if_has_space_byte_compression(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\chunk_compressed_chunk.py", line 233, in extend_if_has_space_byte_compression
serialized_sample, shape = self.serialize_sample(
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\base_chunk.py", line 342, in serialize_sample
incoming_sample, shape = serialize_text(
^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\serialize.py", line 505, in serialize_text
incoming_sample, shape = text_to_bytes(incoming_sample, dtype, htype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\serialize.py", line 458, in text_to_bytes
byts = json.dumps(sample, cls=HubJsonEncoder).encode()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 238, in dumps
**kw).encode(obj)
^^^^^^^^^^^
File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\encoder.py", line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\encoder.py", line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
ValueError: Circular reference detected
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\util\transform.py", line 177, in _handle_transform_error
transform_dataset.flush()
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\transform\transform_dataset.py", line 154, in flush
raise SampleAppendError(name) from e
deeplake.util.exceptions.SampleAppendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\transform\transform.py", line 298, in eval
raise TransformError(
deeplake.util.exceptions.TransformError: Transform failed at index 0 of the input data. See traceback for more details.
I am also getting the same error, did you get a fix?
For me the issue was resolved after I created a new API key for OpenAI.
Hi @CharlesFr thanks for taking this issue on our (Deeplake's) community channel. Marking it close for now. Please let us know if you have any further questions.
I'm running into this issue too.
-
It also doesn't look like the example code has been tested properly. There are many issues, like:
-
I'm happy to contribute fixes, but I can't get it running...
-
IMHO creating a new OpenAI API key (if that's actually the issue?) is not an acceptable resolution for this.
I'm running into this issue too. Did anyone solve it?
Hi, @CharlesFr! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue you reported is related to a TransformError
that occurs when loading documents using the DeepLake
library. There have been multiple comments from users experiencing the same issue. One user suggested that creating a new API key for OpenAI resolved the issue for them. However, another user pointed out additional issues with the example code and expressed frustration with the suggested solution of creating a new API key.
Currently, the issue has been marked as closed, indicating that it has been resolved. However, we wanted to check with you if the issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project. If you have any further questions or concerns, please don't hesitate to reach out.
Finally found the issue after wasting a day on this. Happens when the metadata for the langchain docs is not valid JSON. Please print metadata of any doc and debug as to why its not a valid JSON. Mine had some extra single quotes. Issues resolved when I fixed the metadata JSONs.