ChatKBQA
ChatKBQA copied to clipboard
Inference Performance Issue
Hello!! Firstly, thank you for sharing your work. I greatly appreciate it. While running your code, I encountered some issues during the inference stage. Specifically, when I ran the following command:
The result was as follows:
Namespace(data_file_name='Reading/LLaMA2-7b/WebQSP_Freebase_NQ_lora_epoch100/evaluation_beam/generated_predictions.jsonl')
Loading data from: Reading/LLaMA2-7b/WebQSP_Freebase_NQ_lora_epoch100/evaluation_beam/generated_predictions.jsonl
Dataset len: 1639
Start predicting
total:1639,
ex_cnt:0,
ex_rate:0.0,
real_ex_rate:0.0,
contains_ex_cnt:0,
contains_ex_rate:0.0
real_contains_ex_rate:0.0
As you can see, the evaluation result is all zeros. Furthermore, in the next step, after running the command CUDA_VISIBLE_DEVICES=0 python -u eval_final.py --dataset WebQSP --pred_file Reading/LLaMA2-7b/WebQSP_Freebase_NQ_lora_epoch100/evaluation_beam/beam_test_top_k_predictions.json, the result was as follows:
The problem WebQTest-2025 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2027 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2028 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2029 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2030 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2031 is not in the prediction set
Continue to evaluate the other entries
Number of questions: 1639
Average precision over questions: 0.000
Average recall over questions: 0.002
Average f1 over questions (accuracy): 0.000
0.001830628431970714
F1 of average recall and average precision: 0.000
True accuracy (ratio of questions answered exactly correctly): 0.000
Hits@1 over questions: 0.002
I don't know what the cause is. Could you suggest a solution?
Hi, have you solved it?
Hi @ZepengDu.
I solved the issue with the Virtuoso service. When running the command python3 virtuoso.py start 3001 -d virtuoso_db, the Virtuoso service starts and remains active during training. There's no need to use the stop command (python3 virtuoso.py stop 3001) subsequently. Simply running the start command is sufficient to keep the service operational.
The problem was that i used the both commend for starting and stopping Virtuoso service for me.
Did you use a stop command to shut down the service during testing? However, in my testing, the service has always been running; I did not use a stop command. Strangely, I was able to get results successfully on the first attempt, but it didn't work on subsequent tries.
@ZepengDu
No, I stopped the service before training, so it seems that the training did not proceed properly due to the service being shut down. However, if the service was previously installed and executed even once, the database remains stored on the server. This means that even if the service is shut down, access to the database is still possible.
When the inference was incorrect, I received results (logical form) similar to what you showed me in the picture.
I still don't quite understand. Do you mean that the service should remain active during the training period, and it should not be stopped until the inference is completed? Is this the solution?
Yes, that's correct.
Sure, thank you! I am currently trying it out.
Hi, @meaningful96 . I followed your method by first starting the Virtuoso database, then proceeding with training and inference, but I found that the aforementioned issues still persist.
Hi @meaningful96, it looks like you’ve resolved this issue. Were you able to reproduce the results afterward? In my case, the results weren’t reproducible (as I have issued). Thank you for your response!
@ZepengDu,
Sorry for late replying. To be honest, I don’t know the exact reason, but after rebooting the server and rebuilding the virtual environment, I succeeded in reproducing the results.
As a hypothesis, it seems that the training process was not conducted properly even with the Virtuoso service running. There might have been issues with processing the training data or connectivity problems between the database and the model.
@Leesangoh
Hi, I successfully reproduced the results. However, the performance differed slightly from the values reported in the paper. Without using entity linking annotation, the paper's performance on WebQSP was reported as Hits@1 = 83.2 and F1 = 79.8. In my experiments, both metrics were approximately 3.0 points lower, resulting in Hits@1 = 80.1 and F1 = 76.5.
Thank you for your reply! I think I found the issue
@meaningful96 Hi, I have successfully solved the problem, is the indicator of your reproduction basically down by 3 percentage points?
@ZepengDu Hi,
Yeah, all of the metrics are at least 3 percentage points lower when I experimented.