SmartEdit
SmartEdit copied to clipboard
About the TrainStage1_inference.py
Hello, thanks for your excellent work. During my reproduction process, I completed the training of Stage 1 using the settings you provided. However, when I used TrainStage1_inference.py to test the training results, I found that the generated images were mostly noise or black. I further observed the inference process and found that the output of the LLM was normal, but in the cross-attention part of the Qformer, each row of the generated attention matrix had a value of 1 at the same position. As a result, in the features (77*768) output by the Qformer, each 768-dimensional vector was the same. Do you have any suggestions?