SEED issues

train_scripts/causal_qformer.sh not working

1

Hi, While trying to run step 3 of Training SEED Tokenization, I am observing following error: ``` Traceback (most recent call last): File "/SEED/SEED_Tokenizer/train.py", line 16, in import lavis.tasks as...

shubhamgarg21

Fixes for issues in generating pre-training data by converting images into discrete tokens

Step 3 of Multimodal LLM Pre-training was failing. Fix for same is in this PR

shubhamgarg21

Difference in 'blocks' and 'blocks_for_image'

2

Hi, in tokenizer training, you apply blocks for reconstruction causal embedding, and apply blocks_for_image (in 'blip2_qformer_codebook_all_image.py'). But you apply only blocks in get_codebook_indicies (in 'qformer_quantizer.py'). Why is it difference here?

zheedong

Training Data of Tokenizer

2

Thanks for your great work. In paper, you say that training data of tokenizer is 'CC3M, Unsplash, LAION-COCO, MS-COCO'. Did you use total of those three dataset? Or did you...

zheedong

Train data

1

您好，感谢您的开源和杰出的工作！我想问一下在SEED/MultiModalLLM/configs/data/caption_torchdata_preprocess.yaml中 data_dir: - ${oc.env:PROJECT_ROOT}/data/unsplash_resize/webdataset - CC3M/webdataset/gcc3m_shards 我想问一下这里的数据集从哪里下载呢？我关注到论文里有说“We filtered the samples in these datasets based on image resolution, aspect ratio, and visual-textual similarity. We randomly place images or text at the...

APiaoG

如何获取训练数据？

1

您好！非常感谢您的杰出的开源工作！我想问一下以下训练数据可否可以开源呢？ data_dir: - dataset/seed_v2_0828/caption/unsplash_cc3m - dataset/seed_v2_0828/caption/coco data_dir: /dataset/seed_v2_0828/caption/laion-coco data_dir: dataset/seed_v2_0828/image_interleaved/mmc4 data_dir: dataset/seed_v2_0828/image_interleaved/obelisc data_dir: dataset/seed_v2_0828/caption/WebVid-10m data_dir: dataset/wikipedia_20220301.en 或者是经过src/tools/extract_image_ids_to_torchdata_parallel.py 预处理之前的数据集可否提供一下呢？非常感谢！

APiaoG

Hyperparameter for training SEED Tokenizer

1

Hi! Thank you for the wonderful work. I wonder if you can provide detailed information on training SEED Tokenizer. I cannot find the hyperparameter for training SEED Tokenizer in your...

Cheolhyun-Mun

Training Code

1

Thank you for amazing project. Can you provide the training code?

ChangeNext

Training code for SEED-LLaMA

1

Hi, I am wondering if at any point the training code for SEED-LLaMA will be made available?

shubhamgarg21

About add the quantized image tokens to pretrained language tokenizer.

1

I checked the predict code and paper. It seems you added the quantized image tokens to pretrained language tokenizer. In other papers, Some people separate the tokenizer of language and...

Jiushanhuadao

SEED
SEED copied to clipboard

Metadata

train_scripts/causal_qformer.sh not working

Fixes for issues in generating pre-training data by converting images into discrete tokens

Difference in 'blocks' and 'blocks_for_image'

Training Data of Tokenizer

Train data

如何获取训练数据？

Hyperparameter for training SEED Tokenizer

Training Code

Training code for SEED-LLaMA

About add the quantized image tokens to pretrained language tokenizer.

← Metadata

Owner

Metadata

SEED SEED copied to clipboard

Metadata

← Metadata

Owner

Metadata

SEED
SEED copied to clipboard