langchain icon indicating copy to clipboard operation
langchain copied to clipboard

deeplake.util.exceptions.TransformError

Open CharlesFr opened this issue 1 year ago • 2 comments

I'm attempting to load some Documents and get a TransformError - could someone please point me in the right direction? Thanks! I'm afraid the traceback doesn't mean much to me.

db = DeepLake(dataset_path=deeplake_path, embedding_function=embeddings)
db.add_documents(texts)
  tensor     htype    shape    dtype  compression
  -------   -------  -------  -------  ------- 
 embedding  generic   (0,)    float32   None   
    ids      text     (0,)      str     None   
 metadata    json     (0,)      str     None   
   text      text     (0,)      str     None   
Evaluating ingest: 0%|          | 0/1 [00:10<?
Traceback (most recent call last):
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 1065, in extend
    self._extend(samples, progressbar, pg_callback=pg_callback)
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 1001, in _extend
    self._samples_to_chunks(
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 824, in _samples_to_chunks
    num_samples_added = current_chunk.extend_if_has_space(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\chunk_compressed_chunk.py", line 50, in extend_if_has_space
    return self.extend_if_has_space_byte_compression(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\chunk_compressed_chunk.py", line 233, in extend_if_has_space_byte_compression
    serialized_sample, shape = self.serialize_sample(
                               ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\base_chunk.py", line 342, in serialize_sample
    incoming_sample, shape = serialize_text(
                             ^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\serialize.py", line 505, in serialize_text
    incoming_sample, shape = text_to_bytes(incoming_sample, dtype, htype)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\serialize.py", line 458, in text_to_bytes
    byts = json.dumps(sample, cls=HubJsonEncoder).encode()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
ValueError: Circular reference detected

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\util\transform.py", line 220, in _transform_and_append_data_slice
    transform_dataset.flush()
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\transform\transform_dataset.py", line 154, in flush
    raise SampleAppendError(name) from e
deeplake.util.exceptions.SampleAppendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 1065, in extend
    self._extend(samples, progressbar, pg_callback=pg_callback)
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 1001, in _extend
    self._samples_to_chunks(
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk_engine.py", line 824, in _samples_to_chunks
    num_samples_added = current_chunk.extend_if_has_space(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\chunk_compressed_chunk.py", line 50, in extend_if_has_space
    return self.extend_if_has_space_byte_compression(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\chunk_compressed_chunk.py", line 233, in extend_if_has_space_byte_compression
    serialized_sample, shape = self.serialize_sample(
                               ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\chunk\base_chunk.py", line 342, in serialize_sample
    incoming_sample, shape = serialize_text(
                             ^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\serialize.py", line 505, in serialize_text
    incoming_sample, shape = text_to_bytes(incoming_sample, dtype, htype)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\serialize.py", line 458, in text_to_bytes
    byts = json.dumps(sample, cls=HubJsonEncoder).encode()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\charles\AppData\Local\Programs\Python\Python311\Lib\json\encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
ValueError: Circular reference detected

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\util\transform.py", line 177, in _handle_transform_error
    transform_dataset.flush()
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\transform\transform_dataset.py", line 154, in flush
    raise SampleAppendError(name) from e
deeplake.util.exceptions.SampleAppendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\charles\Documents\GitHub\Chat-with-Github-Repo\venv\Lib\site-packages\deeplake\core\transform\transform.py", line 298, in eval
    raise TransformError(
deeplake.util.exceptions.TransformError: Transform failed at index 0 of the input data. See traceback for more details.

CharlesFr avatar Apr 27 '23 09:04 CharlesFr

I am also getting the same error, did you get a fix?

koleshjr avatar May 21 '23 07:05 koleshjr

For me the issue was resolved after I created a new API key for OpenAI.

Aman0807 avatar Jun 03 '23 15:06 Aman0807

Hi @CharlesFr thanks for taking this issue on our (Deeplake's) community channel. Marking it close for now. Please let us know if you have any further questions.

nalingupta avatar Jun 09 '23 19:06 nalingupta

I'm running into this issue too.

  • It also doesn't look like the example code has been tested properly. There are many issues, like: image

  • I'm happy to contribute fixes, but I can't get it running...

  • IMHO creating a new OpenAI API key (if that's actually the issue?) is not an acceptable resolution for this.

nicdesousa avatar Jun 30 '23 10:06 nicdesousa

I'm running into this issue too. Did anyone solve it?

acehinnnqru avatar Jul 03 '23 05:07 acehinnnqru

Hi, @CharlesFr! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you reported is related to a TransformError that occurs when loading documents using the DeepLake library. There have been multiple comments from users experiencing the same issue. One user suggested that creating a new API key for OpenAI resolved the issue for them. However, another user pointed out additional issues with the example code and expressed frustration with the suggested solution of creating a new API key.

Currently, the issue has been marked as closed, indicating that it has been resolved. However, we wanted to check with you if the issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project. If you have any further questions or concerns, please don't hesitate to reach out.

dosubot[bot] avatar Oct 02 '23 16:10 dosubot[bot]

Finally found the issue after wasting a day on this. Happens when the metadata for the langchain docs is not valid JSON. Please print metadata of any doc and debug as to why its not a valid JSON. Mine had some extra single quotes. Issues resolved when I fixed the metadata JSONs.

asuag avatar Nov 27 '23 04:11 asuag