ChatKBQA icon indicating copy to clipboard operation
ChatKBQA copied to clipboard

Inference Performance Issue

Open meaningful96 opened this issue 1 year ago • 14 comments

Hello!! Firstly, thank you for sharing your work. I greatly appreciate it. While running your code, I encountered some issues during the inference stage. Specifically, when I ran the following command:

The result was as follows:

Namespace(data_file_name='Reading/LLaMA2-7b/WebQSP_Freebase_NQ_lora_epoch100/evaluation_beam/generated_predictions.jsonl')
Loading data from: Reading/LLaMA2-7b/WebQSP_Freebase_NQ_lora_epoch100/evaluation_beam/generated_predictions.jsonl
Dataset len: 1639

Start predicting 
total:1639, 
                    ex_cnt:0, 
                    ex_rate:0.0, 
                    real_ex_rate:0.0, 
                    contains_ex_cnt:0, 
                    contains_ex_rate:0.0
                    real_contains_ex_rate:0.0

As you can see, the evaluation result is all zeros. Furthermore, in the next step, after running the command CUDA_VISIBLE_DEVICES=0 python -u eval_final.py --dataset WebQSP --pred_file Reading/LLaMA2-7b/WebQSP_Freebase_NQ_lora_epoch100/evaluation_beam/beam_test_top_k_predictions.json, the result was as follows:

The problem WebQTest-2025 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2027 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2028 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2029 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2030 is not in the prediction set
Continue to evaluate the other entries
The problem WebQTest-2031 is not in the prediction set
Continue to evaluate the other entries
Number of questions: 1639
Average precision over questions: 0.000
Average recall over questions: 0.002
Average f1 over questions (accuracy): 0.000
0.001830628431970714
F1 of average recall and average precision: 0.000
True accuracy (ratio of questions answered exactly correctly): 0.000
Hits@1 over questions: 0.002

I don't know what the cause is. Could you suggest a solution?

meaningful96 avatar Sep 14 '24 08:09 meaningful96

Hi, have you solved it?

ZepengDu avatar Jan 22 '25 04:01 ZepengDu

Hi @ZepengDu.

I solved the issue with the Virtuoso service. When running the command python3 virtuoso.py start 3001 -d virtuoso_db, the Virtuoso service starts and remains active during training. There's no need to use the stop command (python3 virtuoso.py stop 3001) subsequently. Simply running the start command is sufficient to keep the service operational.

The problem was that i used the both commend for starting and stopping Virtuoso service for me.

meaningful96 avatar Jan 24 '25 01:01 meaningful96

Did you use a stop command to shut down the service during testing? However, in my testing, the service has always been running; I did not use a stop command. Strangely, I was able to get results successfully on the first attempt, but it didn't work on subsequent tries.

Image

ZepengDu avatar Jan 24 '25 03:01 ZepengDu

@ZepengDu
No, I stopped the service before training, so it seems that the training did not proceed properly due to the service being shut down. However, if the service was previously installed and executed even once, the database remains stored on the server. This means that even if the service is shut down, access to the database is still possible.

When the inference was incorrect, I received results (logical form) similar to what you showed me in the picture.

meaningful96 avatar Jan 24 '25 04:01 meaningful96

I still don't quite understand. Do you mean that the service should remain active during the training period, and it should not be stopped until the inference is completed? Is this the solution?

ZepengDu avatar Jan 24 '25 05:01 ZepengDu

Yes, that's correct.

meaningful96 avatar Jan 24 '25 05:01 meaningful96

Sure, thank you! I am currently trying it out.

ZepengDu avatar Jan 24 '25 05:01 ZepengDu

Hi, @meaningful96 . I followed your method by first starting the Virtuoso database, then proceeding with training and inference, but I found that the aforementioned issues still persist.

ZepengDu avatar Jan 25 '25 04:01 ZepengDu

Hi @meaningful96, it looks like you’ve resolved this issue. Were you able to reproduce the results afterward? In my case, the results weren’t reproducible (as I have issued). Thank you for your response!

Leesangoh avatar Jan 27 '25 07:01 Leesangoh

@ZepengDu,

Sorry for late replying. To be honest, I don’t know the exact reason, but after rebooting the server and rebuilding the virtual environment, I succeeded in reproducing the results.

As a hypothesis, it seems that the training process was not conducted properly even with the Virtuoso service running. There might have been issues with processing the training data or connectivity problems between the database and the model.

meaningful96 avatar Jan 27 '25 07:01 meaningful96

@Leesangoh

Hi, I successfully reproduced the results. However, the performance differed slightly from the values reported in the paper. Without using entity linking annotation, the paper's performance on WebQSP was reported as Hits@1 = 83.2 and F1 = 79.8. In my experiments, both metrics were approximately 3.0 points lower, resulting in Hits@1 = 80.1 and F1 = 76.5.

meaningful96 avatar Jan 27 '25 07:01 meaningful96

Thank you for your reply! I think I found the issue

Leesangoh avatar Jan 27 '25 10:01 Leesangoh

@meaningful96 Hi, I have successfully solved the problem, is the indicator of your reproduction basically down by 3 percentage points?

ZepengDu avatar Mar 03 '25 03:03 ZepengDu

@ZepengDu Hi,

Yeah, all of the metrics are at least 3 percentage points lower when I experimented.

meaningful96 avatar Mar 03 '25 05:03 meaningful96