InterGPS
InterGPS copied to clipboard
Poor performance of theorem predictor
Hello, Pan. Thank you for your open source.
I download checkpoint model from https://acl2021-intergps.s3.us-west-1.amazonaws.com/tp_model_best.pt But the evaluation results are empty. How can I get it back to normal? Thanks.
Hi, Thank you for your interest in our work!
This evaluation result is not normal. Would you mind sharing the script you were running and the log it printed? It could help me narrow down the reasons.
Thanks!
Best, Pan
Thank you for your reply!
The script is same as yours. I only change the file name of output.
#!/usr/bin/env python
coding: utf-8
import json import ast from tqdm import tqdm
import torch from transformers import BartForConditionalGeneration, BartTokenizerFast
def evaluate(diagram_logic_file, text_logic_file, tokenizer_name, model_name, check_point, seq_num):
test_lst = range(2401, 3002)
## read logic form files
with open(diagram_logic_file) as f:
diagram_logic_forms = json.load(f)
with open(text_logic_file) as f:
text_logic_forms = json.load(f)
combined_logic_forms = {}
for pid in test_lst:
combined_logic_forms[pid] = diagram_logic_forms[str(pid)]['diagram_logic_forms'] + \
text_logic_forms[str(pid)]['text_logic_forms']
## build tokenizer and model
tokenizer = BartTokenizerFast.from_pretrained(tokenizer_name) # 'facebook/bart-base'
model = BartForConditionalGeneration.from_pretrained(model_name).to(device) # 'facebook/bart-base'
model.load_state_dict(torch.load(check_point))
final = dict()
for pid in tqdm(test_lst):
input = str(combined_logic_forms[pid])
tmp = tokenizer.encode(input)
if len(tmp) > 1024:
tmp = tmp[:1024]
input = torch.LongTensor(tmp).unsqueeze(0).to(device)
output = model.generate(input, bos_token_id=0, eos_token_id=2,
max_length=20, num_beams=10, num_return_sequences=seq_num)
# print(out.size())
## refine output sequence
seq = []
for j in range(seq_num):
res = tokenizer.decode(output[j].tolist())
res = res.replace("</s>", "").replace("<s>", "").replace("<pad>", "")
# print(res)
try:
res = ast.literal_eval(res) # string class to list class
except Exception as e:
res = []
seq.append(res)
final[str(pid)] = {"id": str(pid), "num_seqs": seq_num, "seq": seq}
return final
if name == 'main':
diagram_logic_file = '../data/geometry3k/logic_forms/diagram_logic_forms_annot.json'
text_logic_file = '../data/geometry3k/logic_forms/text_logic_forms_annot_dissolved.json'
check_point = 'models/tp_model_best.pt'
output_file = 'results/test/pred_seqs_test_debugging.json'
tokenizer_name = 'facebook/bart-base'
model_name = 'facebook/bart-base'
SEQ_NUM = 5
device = torch.device('cuda:0')
result = evaluate(diagram_logic_file, text_logic_file, tokenizer_name, model_name, check_point, SEQ_NUM)
with open(output_file, 'w') as f:
json.dump(result, f)
The log:
D:\Anaconda\envs\intergps\python.exe D:/WorkSpace/InterGPS-main/theorem_predict/eval_transformer.py 0%| | 0/601 [00:00<?, ?it/s]D:\Anaconda\envs\intergps\lib\site-packages\transformers\generation_utils.py:1839: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). next_indices = next_tokens // vocab_size 22%|██▏ | 135/601 [00:23<01:25, 5.43it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1569 > 1024). Running this sequence through the model will result in indexing errors 100%|██████████| 601/601 [01:42<00:00, 5.88it/s]
Process finished with exit code 0
Thanks!
Hi,
Below is my script:
cd symbolic_solver
python test.py --label final --strategy final
And the running log is here: https://github.com/lupantech/InterGPS/blob/main/symbolic_solver/logs/final/log-1612098244-predict_low-first_1.log.
The executed result is here: https://github.com/lupantech/InterGPS/blob/main/symbolic_solver/pred_results/final/logic_1612098244-predict_low-first_1.json.
Thank you Pan!
I can run your script to get the corresponding results. But I am focus on theorem predictor. I wonder how to generate ../theorem_predict/results/pred_seq_result_bart_epoch19_seq5.json. I also found that many geometry problems can be solved by rules based on formal language without theorems. Can I understand that theorem prediction is not so important in this paper?
Thanks!
Best, Fucheng
Hi Fucheng,
For the theorem predictor, you can follow the instructions at https://github.com/lupantech/InterGPS#theorem-predictor.
For the second question, yes. As we discussed in the paper, one of the main functions of the theorem predictor is to improve the search efficiency and thus improve the final accuracy, which is verified in Table 7 and Figure 5.
Best, Pan
Thanks, Pan!
I follow the instruction at https://github.com/lupantech/InterGPS#theorem-predictor. I download the pre-trained model at step 4. But the evaluation results are empty in step 5.
Thanks!
Best, Fucheng
Hi Fucheng,
I see. Would you mind if I checked your issue a few days later? I am working on some emergent deadlines and I need more time to figure your problem out. For now, I think it is not a big problem to ignore the theorem predictor if you just want to reproduce our results.
I appreciate your understanding!
Best, Pan
Thanks, Pan.
Sure. Thank you for your work and look forward to your new achievements. Your paper and code have inspired me a lot.
Best, Fucheng
Hi Fucheng,
Thanks! I am happy to help with your project as well!
Yours sincerely, Pan