repeng
repeng copied to clipboard
truncated_output_suffixes &
with open("data/all_truncated_outputs.json") as f:
output_suffixes = json.load(f)
truncated_output_suffixes = [
tokenizer.convert_tokens_to_string(tokens[:i])
for tokens in (tokenizer.tokenize(s) for s in output_suffixes)
for i in range(1, len(tokens))
]
truncated_output_suffixes_512 = [
tokenizer.convert_tokens_to_string(tokens[:i])
for tokens in (tokenizer.tokenize(s) for s in output_suffixes[:512])
for i in range(1, len(tokens))
]
files referenced that do not exist in the repo for the mve
another ex is true_facts.json (did not find an example in the paper that mentioned facts or a .json file)
created a script that i think mimics what you were showcasing
https://gist.github.com/thistleknot/b936477ee82ce608b3c7f47381f6b15d
make sure you're running the notebook with cwd in the notebooks folder, the data folder is notebooks/data. alternatively you can just copy the data folder to wherever you need it (you can figure out the current cwd with import os; print(os.getcwd()) and copy the data folder there), it's pretty small.