Open-Sora
Open-Sora copied to clipboard
为什么我训练的时候,每个epoch非常快呐?就像没有没有正确加载数据一样?
[2024-06-29 05:36:29] Beginning epoch 0... Epoch 0: 0it [00:00, ?it/s] INFO: Pandarallel will run on 16 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers. INFO: Pandarallel will run on 16 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers. INFO: Pandarallel will run on 16 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers. [2024-06-29 05:36:30] Building buckets... INFO: Pandarallel will run on 16 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers. [2024-06-29 05:36:31] Bucket Info: [2024-06-29 05:36:31] Bucket [#sample, #batch] by aspect ratio: {'0.56': [160, 3]} [2024-06-29 05:36:31] Image Bucket [#sample, #batch] by HxWxT: {} [2024-06-29 05:36:31] Video Bucket [#sample, #batch] by HxWxT: {('144p', 51): [160, 3]} [2024-06-29 05:36:31] #training batch: 3, #training sample: 160, #non empty bucket: 1 [2024-06-29 05:36:31] Beginning epoch 1... Epoch 1: 0it [00:00, ?it/s]INFO: Pandarallel will run on 16 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers. Epoch 1: 0it [00:00, ?it/s] INFO: Pandarallel will run on 16 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers. INFO: Pandarallel will run on 16 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers. INFO: Pandarallel will run on 16 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
这是csv文件(我创建了几百个视频为了微调): path,text,id,relpath,num_frames,height,width,aspect_ratio,fps,resolution /home/yy/Open-Sora/clips/sample_0_scene-0.mp4,a dog is running,sample_0_scene-0,sample_0_scene-0.mp4,96.0,144.0,256.0,0.5625,24.0,36864.0 /home/yy/Open-Sora/clips/sample_1_scene-0.mp4,a dog is running,sample_1_scene-0,sample_1_scene-0.mp4,96.0,144.0,256.0,0.5625,24.0,36864.0 /home/yy/Open-Sora/clips/sample_2_scene-0.mp4,a dog is running,sample_2_scene-0,sample_2_scene-0.mp4,96.0,144.0,256.0,0.5625,24.0,36864.0 /home/yy/Open-Sora/clips/sample_3_scene-0.mp4,a dog is running,sample_3_scene-0,sample_3_scene-0.mp4,96.0,144.0,256.0,0.5625,24.0,36864.0
请问是我那里遗漏了吗?好像训练没有成功
我也是这个问题
我也是这个问题
batchsize 没满,最后一个drop_last默认丢弃,改为false就好了
我也是这个问题
batchsize 没满,最后一个drop_last默认丢弃,改为false就好了
Thanks 但是我有两百个样本,batch_size=4,我目前怀疑是bucket和视频精度不匹配的问题。
This issue is stale because it has been open for 7 days with no activity.
@xbyym 可以在这一行https://github.com/hpcaitech/Open-Sora/blob/main/scripts/train.py#L264下面插入print(batch) 看看
@xbyym
数据加载环节有过滤,需要根据自己数据的分布特点来设置bucket_config,代码段:
https://github.com/hpcaitech/Open-Sora/blob/main/opensora/datasets/sampler.py#L200-L207
子函数:https://github.com/hpcaitech/Open-Sora/blob/main/opensora/datasets/bucket.py#L74-L120
@xbyym 数据加载环节有过滤,需要根据自己数据的分布特点来设置bucket_config,代码段: https://github.com/hpcaitech/Open-Sora/blob/main/opensora/datasets/sampler.py#L200-L207 子函数:https://github.com/hpcaitech/Open-Sora/blob/main/opensora/datasets/bucket.py#L74-L120
谢谢~ 请问有什么设置bucket的说明吗 我遇到了一个问题是:视频帧长度不足51的会报错 不知道如何跳过或者设置
This issue is stale because it has been open for 7 days with no activity.
@xbyym 可以在这一行https://github.com/hpcaitech/Open-Sora/blob/main/scripts/train.py#L264下面插入
print(batch)看看
我也遇到了相同的问题,在大多数轮次时我print(batch)不包含任何数据,极个别epoch可以正常进行训练,这是为什么?
为什么我的outputs只有两个日志和一个tensorboard,模型保存在哪儿了,貌似也没有覆盖传参时指定的ckpts啊;
是我-ckpt-path 配置有问题吗:
torchrun --standalone --nproc_per_node 4 scripts/train.py configs/opensora-v1-2/train/stage1.py --data-path /data02/Open-Sora/datasets0/webvid-10M/data_train_partitions_0000_100/meta/meta_clips_caption1.csv --ckpt-path /data02/Open-Sora/ckpts/PixArt-Sigma-XL-2-2K-MS.pth
你好,方便给一个数据处理的镜像吗 做了几次了 还是跑不起来
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.