ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

KeyError: 'txt'

Open pokameng opened this issue 2 years ago • 16 comments

Hello Thanks for your interesting works! I want to use colossalAI to train a model on lsun bedroom dataset, and my config is as follow: image When i run the train.sh, somethong was wrong with it:

Keyerror 'caption'

When i reoplace caption with txt , proplem is that keyerror 'txt'

Uploading image.png…

@ryanrussell @xcnick @feifeibear @junxu @jimmieliu

pokameng avatar Nov 20 '22 19:11 pokameng

The bug is: image

pokameng avatar Nov 20 '22 19:11 pokameng

image

Fazziekey avatar Nov 25 '22 03:11 Fazziekey

if you want to use your own datasets please refer to https://github.com/hpcaitech/ColossalAI/blob/main/examples/images/diffusion/ldm/data/base.py, the data format should be same as your yaml config

Fazziekey avatar Nov 25 '22 03:11 Fazziekey

image

Fazziekey avatar Nov 25 '22 03:11 Fazziekey

for example if the YAML file , first_stage_key: image, cond_stage_key: caption, your dataset should also return { caption,image}

Fazziekey avatar Nov 25 '22 03:11 Fazziekey

if you want to use your own datasets please refer to https://github.com/hpcaitech/ColossalAI/blob/main/examples/images/diffusion/ldm/data/base.py, the data format should be same as your yaml config

I know, but the lsun.py you provide is suitable for lsun bedroom right? The config on data is shown as follow: image

Shuold I change the target : ldm.data.lsun.LSUNBedroomsTrain to ldm.data.lsun.LSUNBedroomsTrain?

But I want to use ldm.data.lsun.LSUNBedroomsTrain to train lsun bedroom dataset.

pokameng avatar Nov 25 '22 03:11 pokameng

yes, the target in the yaml will create a python object with params, you should make it be a make it like a common torch data loadder

Fazziekey avatar Nov 25 '22 03:11 Fazziekey

But I want to use ldm.data.lsun.LSUNBedroomsTrain to train lsun bedroom dataset.

But I want to use ldm.data.lsun.LSUNBedroomsTrain to train lsun bedroom dataset. If i use ldm.data.lsun.LSUNBedroomsTrain, it has a wrong: text input must of type str(single example),List[str](batch or single pretokenized example) orList[List[str]] (batch of pretokenized examples).

The lsun.py is shown as follow: image

pokameng avatar Nov 25 '22 03:11 pokameng

the lsun is an uncondition example, which means only an image without a text prompt, you should also change the model config in your train yaml. It was not recommend for training your own model

Fazziekey avatar Nov 25 '22 03:11 Fazziekey

image

Fazziekey avatar Nov 25 '22 03:11 Fazziekey

I think the base.py is an good example to train image with text caption

Fazziekey avatar Nov 25 '22 03:11 Fazziekey

I think the base.py is an good example to train image with text caption

Can you give me a example for text? I want to kown the content in txt

pokameng avatar Nov 25 '22 04:11 pokameng

I think the base.py is an good example to train image with text caption

you mean the cifar10 is the conditional example?

pokameng avatar Nov 25 '22 07:11 pokameng

you can try teyvat.py and the datasets are here https://huggingface.co/datasets/Fazzie/Teyvat

Fazziekey avatar Nov 25 '22 07:11 Fazziekey

image

Fazziekey avatar Nov 25 '22 07:11 Fazziekey

image with text is a right format

Fazziekey avatar Nov 25 '22 07:11 Fazziekey

We have updated a lot. This issue was closed due to inactivity. Thanks.

binmakeswell avatar Apr 13 '23 10:04 binmakeswell