raptor icon indicating copy to clipboard operation
raptor copied to clipboard

TypeError: expected string or buffer

Open LeonMing30 opened this issue 8 months ago • 3 comments

I tried to run demo code for testing, but there is the error.

`
from raptor import RetrievalAugmentation

RA = RetrievalAugmentation()

with open('demo/sample.txt', 'r') as file:
    text = file.read()
RA.add_documents(text)
question = "How did Cinderella reach her happy ending?"
answer = RA.answer_question(question=question)
print("Answer: ", answer)`
Traceback (most recent call last):
  File "D:\Code\Python\20240531\RAPTOR\raptor\demotest.py", line 13, in <module>
    RA.add_documents(text)
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\RetrievalAugmentation.py", line 219, in add_documents
    self.tree = self.tree_builder.build_from_text(text=docs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\tree_builder.py", line 291, in build_from_text
    root_nodes = self.construct_tree(all_nodes, all_nodes, layer_to_nodes)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\cluster_tree_builder.py", line 130, in construct_tree
    process_cluster(
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\cluster_tree_builder.py", line 77, in process_cluster
    f"Node Texts Length: {len(self.tokenizer.encode(node_texts))}, Summarized Text Length: {len(self.tokenizer.encode(summarized_text))}"
                                                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\venv\Lib\site-packages\tiktoken\core.py", line 116, in encode
    if match := _special_token_regex(disallowed_special).search(text):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or buffer

How can I fix it?

LeonMing30 avatar May 31 '24 09:05 LeonMing30