pytorch-video-recognition icon indicating copy to clipboard operation
pytorch-video-recognition copied to clipboard

Train from scratch

Open aoluming opened this issue 3 years ago • 35 comments

Anyone try to train from scratch on Ucf101 on C3D? The accuracy keep 1%. I use other models implemented by myself and the accuracy is also 1%. The learning rate is 1e-5. Does anyone have some idea on it?

aoluming avatar Aug 14 '20 01:08 aoluming

I am also trying to train from scratch and after 13 or so epochs the train and test accuracies are 43% and 28% almost. Im also using a custom architecture. So maybe the bug is in your code. Its not possible to provide any more solutions without knowing the specifics of your code.

Farabi-shafkat avatar Aug 14 '20 12:08 Farabi-shafkat

@Farabi-shafkat
Thank you for your kind reply! Have you ever tried to train from scratch on C3D? https://github.com/aoluming/Cost_model ,this is my custom architecture, which I follow the paper 'Collaborative Spatiotemporal Feature Learning for Video Action Recognition' in CVPR2019. This paper is not opoen-source. I would be really appreciated if you can check the code for me.

aoluming avatar Aug 14 '20 13:08 aoluming

Hello, no I have not trained from scratch on c3d. And i am by no means an expert, i have seen your code but i could not find any bug in your custom network implementation. However there is one thing that might be wrong. check this thread out. https://github.com/jfzhang95/pytorch-video-recognition/issues/30#issue-465325711

Farabi-shafkat avatar Aug 16 '20 11:08 Farabi-shafkat

Hello, you are such a modest man and thank you for doing so much for me. I will try this thread in my code. @Farabi-shafkat

aoluming avatar Aug 16 '20 13:08 aoluming

有人试着用Ucf 101从零开始在C3D上训练吗?准确度保持1%。我使用的是由我自己实现的其他模型,准确率也是1%。学习率为1E-5。有人对此有什么想法吗?

Anyone try to train from scratch on Ucf101 on C3D? The accuracy keep 1%. I use other models implemented by myself and the accuracy is also 1%. The learning rate is 1e-5. Does anyone have some idea on it?

I get the same acc=1% with train from scratch,in this code

libb999 avatar Aug 18 '20 09:08 libb999

@libb999 同学你用的就是c3d么,有尝试用其他模型么,用这篇repo的c3d加他给的pretrain几个迭代acc就上97,从0训练就1%,我觉得很离谱,你觉得可能是哪里出问题了么。

aoluming avatar Aug 18 '20 11:08 aoluming

是的,我用的也是c3d,情况跟你一模一样

libb999 avatar Aug 19 '20 00:08 libb999

@libb999 我用其他模型也是百分之1,但是我在训练过程中print了输出,发现了一个问题。就是我在网络中不加dropout的话,输出的类索引基本是一个或两个固定值,不论跑多少epoch都是这样。我不知道是网络的问题还是训练代码的问题,或者是data的问题。但是加了pretrain就很高,说明data可能就没问题

aoluming avatar Aug 19 '20 01:08 aoluming

我感觉代码有问题

libb999 avatar Aug 19 '20 01:08 libb999

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

aoluming avatar Aug 19 '20 01:08 aoluming

有人解决了吗,训练精度一直很低

shanchao0906 avatar Sep 29 '20 07:09 shanchao0906

image

shanchao0906 avatar Oct 17 '20 02:10 shanchao0906

I ran into the same issue. If you reduce the number of classes then the model will converge. For instance, if you reduce ufc101 to 7 classes and then train from scratch the model will converge to 95% validation accuracy. Training from scratch is known to take forever.

BryceWayne avatar Oct 26 '20 19:10 BryceWayne

Is there anyone who solve this problem? I also met the same problem. The loss quickly converge ,but the accuracy is only 1% in top 1 and 5% in top ten.

HuangZuShu avatar Nov 19 '20 09:11 HuangZuShu

The loss should be computed with the outputs. I have good training now.

BryceWayne avatar Nov 19 '20 16:11 BryceWayne

Is there anyone who solve this problem? I also met the same problem. The loss quickly converge ,but the accuracy is only 1% in top 1 and 5% in top ten.

Make sure to check that the loss is computed with the outputs. image

BryceWayne avatar Nov 19 '20 21:11 BryceWayne

Is there anyone who solve this problem? I also met the same problem. The loss quickly converge ,but the accuracy is only 1% in top 1 and 5% in top ten.

Make sure to check that the loss is computed with the outputs. image

Thank you for your reply!My loss function is computed with the outputs, you can see in the picture following, and I couldn't find any problem. image

HuangZuShu avatar Nov 20 '20 01:11 HuangZuShu

@jfzhang95 I meet the same error, when I train from scratch on ucf101. The accuracy is very low about(0.001). Do you have any good suggestions?Thanks

skyqwe123 avatar Mar 08 '21 09:03 skyqwe123

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

skyqwe123 avatar Mar 18 '21 12:03 skyqwe123

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

在原代码里写的是loss = criterion(outputs, labels),但实际上应该是probs才对吧。如果把这块改了准备结果会不会变好些。 ` if phase == 'train': outputs = model(inputs)

            else:
                with torch.no_grad():
                    outputs = model(inputs)

            probs = nn.Softmax(dim=1)(outputs)
            preds = torch.max(probs.data, 1)[1]
            labels=labels.long()
            loss = criterion(probs, labels)`

alonelysnake avatar Apr 05 '21 16:04 alonelysnake

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

在原代码里写的是loss = criterion(outputs, labels),但实际上应该是probs才对吧。如果把这块改了准备结果会不会变好些。 ` if phase == 'train': outputs = model(inputs)

            else:
                with torch.no_grad():
                    outputs = model(inputs)

            probs = nn.Softmax(dim=1)(outputs)
            preds = torch.max(probs.data, 1)[1]
            labels=labels.long()
            loss = criterion(probs, labels)`

您好,请问您这个解决了吗?精度有没有提升呢?

Krystal0606 avatar Apr 07 '21 01:04 Krystal0606

您好,请问预训练模型怎么加呢?我从0训练在20个epoch左右精度就开始上不去了,训练集精度一直在0.22-0.24之间,验证集精度0.25-0.27之间震荡,没有出现上述提到的只有1%的情况,想请问这是怎么回事呀? @aoluming

Krystal0606 avatar Apr 07 '21 01:04 Krystal0606

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

在原代码里写的是loss = criterion(outputs, labels),但实际上应该是probs才对吧。如果把这块改了准备结果会不会变好些。 ` if phase == 'train': outputs = model(inputs)

            else:
                with torch.no_grad():
                    outputs = model(inputs)

            probs = nn.Softmax(dim=1)(outputs)
            preds = torch.max(probs.data, 1)[1]
            labels=labels.long()
            loss = criterion(probs, labels)`

您好,请问您这个解决了吗?精度有没有提升呢?

试过了还是不行。我看您说您的精度在0.22-0.24左右,请问您对代码做过哪些修改吗?还是设置好路径和超参数后就直接运行了?

alonelysnake avatar Apr 07 '21 03:04 alonelysnake

没有做修改,我是按照他的数据处理方法对ucf101进行处理并从0开始训练,到20个epoch左右精度就上不去了。不知道您有没有使用预训练模型跑过呢? @alonelysnake

Krystal0606 avatar Apr 07 '21 06:04 Krystal0606

想起来,改了一下学习率,从1e-5改成1e-3,不过改动前后差别不大。 @alonelysnake

Krystal0606 avatar Apr 07 '21 06:04 Krystal0606

@Krystal0606 用不用预训练模型我都试过了,结果都基本1%左右。如果不改我之前提到的那个地方,在学习率是1e-3时loss会报nan,1e-4及以下时loss在9左右。把那个地方改了之后学习率在1e-3的时候也可以跑了,loss降到4左右,但准确率依然保持不变。

alonelysnake avatar Apr 07 '21 07:04 alonelysnake

@Krystal0606 我看我前几个epoch的loss和准确率都一直在波动,所以所有的都只训练了五到十次。不知道您的训练是开始时和我一样,然后突然从一个epoch开始提高,还是从一开始就一直在提高呢?

alonelysnake avatar Apr 07 '21 08:04 alonelysnake

我的损失值是一直在下降,精度也是一直在提高的,但是精度到22左右就开始震荡了,在学习率为1e-3时没有出现loss为nan的情况,我的loss一开始就差不多4左右最后是降到3左右。不知道这是什么情况 @ @alonelysnake

Krystal0606 avatar Apr 07 '21 09:04 Krystal0606

@Krystal0606 我在知乎上看到一个人用了预训练模型,代码也没有改动,20个epoch后准确率百分之九十几。这么来看不同电脑上跑出来的结果差异好大,有没有可能是随机种子的问题?我对这方面没研究过。

alonelysnake avatar Apr 07 '21 10:04 alonelysnake

train 10 epoch,C3D的ACC也是1% 另外我改了loss = criterion(probs, labels) 不然ACC会nan

Taylor-X76 avatar May 25 '21 05:05 Taylor-X76