Weihan Wang comments

Results 15 comments of


                                            Weihan Wang

Performance drop when using DDP

> Hi! Thanks for your great job! i tried to pretrain the model on multi node,multi gpus (8 * 8 gpus as vilt did), and observed a performance drop when...

pretraining task

> Hello, we tried ITC in our follow-up work FIBER (https://arxiv.org/abs/2206.07643) and you can see the ablation study in the appendix. @zdou0830 Thank you for your kindly help, could you...

Flickr30k Image and Text Retrieval - Query regarding training

> In [this line](https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/objectives.py?fbclid=IwAR1YnT-PjjklnNLX-WSDmNCUW3ZQNz2kcmtoQtHGMqG65ecpM62cUJIljrU#L428) the answer is being initialized to zeros and never changed. I am not able to understand how this helps with both positive and negative examples. >...

cannot reproduce the performance of visual Entailment dataset.

> hi, again Is there any plan to release the Visual Entailment codes or process datasets? I still cannot reproduce the performance. Everything goes well except the VE tasks. It...

BLIP-2 predicting answers outside the specified answer candidates

> > Ranking-style QA is not yet supported in BLIP-2. > > Hi! I have a question about the architecture of the Q-Former in BLIP-2. In the paper, I see...

[BLIP2]: Low accuracy of zeroshot VQA of BLIP2-opt-2.7b

@YuanLiuuuuuu Can flan-t5 model achieve a normal result?

finetune原文件运行三处报错

> > 你是在windows运行的吗？为什么会有\r呢 > > 原脚本VisualGLM-6B/finetune/finetune_visualglm.sh > > #! /bin/bash NUM_WORKERS=1 NUM_GPUS_PER_WORKER=1 MP_SIZE=1 > > script_path=$(realpath $0) script_dir=$(dirname $script_path) main_dir=$(dirname $script_dir) MODEL_TYPE="visualglm-6b" MODEL_ARGS="--max_source_length 64 --max_target_length 256 --lora_rank 10 --pre_seq_len 4"...

hard negative sample selection

> For ITM, samples that are similar to positives will be considered as hard negatives @LiJunnan1992 Is there a high probability that this is a false negative sample?

文本较多的场景，想实现 OCR-free的文本提取和表格重建，是不是需要微调才能有好的效果？

有具体的案例吗？