truncated_output_suffixes &

Open thistleknot opened this issue 1 year ago • 1 comments

with open("data/all_truncated_outputs.json") as f:
    output_suffixes = json.load(f)
truncated_output_suffixes = [
    tokenizer.convert_tokens_to_string(tokens[:i])
    for tokens in (tokenizer.tokenize(s) for s in output_suffixes)
    for i in range(1, len(tokens))
]
truncated_output_suffixes_512 = [
    tokenizer.convert_tokens_to_string(tokens[:i])
    for tokens in (tokenizer.tokenize(s) for s in output_suffixes[:512])
    for i in range(1, len(tokens))
]

files referenced that do not exist in the repo for the mve

another ex is true_facts.json (did not find an example in the paper that mentioned facts or a .json file)

Apr 29 '24 00:04 thistleknot

created a script that i think mimics what you were showcasing

https://gist.github.com/thistleknot/b936477ee82ce608b3c7f47381f6b15d

Apr 29 '24 03:04 thistleknot

make sure you're running the notebook with cwd in the notebooks folder, the data folder is notebooks/data. alternatively you can just copy the data folder to wherever you need it (you can figure out the current cwd with import os; print(os.getcwd()) and copy the data folder there), it's pretty small.

May 24 '24 23:05 vgel