PLELog icon indicating copy to clipboard operation
PLELog copied to clipboard

Training process error

Open superzeroT opened this issue 1 year ago • 17 comments

The training process encountered the following problems 图片1 bidirectional set as False.Another problem arose. 图片2 I have removed the back part of the multiplication in the image below.Can be trained but the result is wrong. 图片3 Below is the contents of my data file 图片4 I can't solve the problem yet.I hope I can get some help to sovle this problem.Thank you very much!

superzeroT avatar Jul 02 '23 08:07 superzeroT

Can you check which one is 200d, and which one is 100d in hiddens * sent_probs? This could help clearify this issue.

LeonYang95 avatar Jul 02 '23 12:07 LeonYang95

Can you check which one is 200d, and which one is 100d in hiddens * sent_probs? This could help clearify this issue.

hiddens is 200d,sent_probs is 100d.

superzeroT avatar Jul 02 '23 13:07 superzeroT

Sorry, I failed to reproduce this error. However, here's a tip that may help:

The shape of sent_prob should be batch_size * seq_len after the attention mechanism. And become batch_size * seq_len * 1 after the view operation. Therefore, I am not sure which part goes wrong. As shown in your output, 100 could be the batch_size.

Here I attach a screenshot of my runtime outputs with shapes, I hope this can help you with debugging. image

LeonYang95 avatar Jul 03 '23 03:07 LeonYang95

图片1

superzeroT avatar Jul 03 '23 03:07 superzeroT

图片1

What are the shapes after the view operation?

LeonYang95 avatar Jul 03 '23 03:07 LeonYang95

图片2

superzeroT avatar Jul 03 '23 03:07 superzeroT

So it seems fine?

sent_probs can be regarded as the attention score of the hidden states for each log event in the log sequence. The multiplication between hidden_states and sent_probs is actually an averaged summation of hidden states so that it gives a final representation for each log sequence.

LeonYang95 avatar Jul 03 '23 03:07 LeonYang95

So it seems fine?

sent_probs can be regarded as the attention score of the hidden states for each log event in the log sequence. The multiplication between hidden_states and sent_probs is actually an averaged summation of hidden states so that it gives a final representation for each log sequence.

Thank you very much for your help.I'll keep looking for a solution.

superzeroT avatar Jul 03 '23 03:07 superzeroT

Hi @superzeroT , may I ask if the issue is still unresolved? And, as mentioned in your screenshot, "此处有相应的修改“, what were those exactly?

LeonYang95 avatar Jul 05 '23 09:07 LeonYang95

Hi @LeonYang95 ,I haven't solved the problem yet.I tried to unify the dimensions but it didn't work.Don't worry about the note I added.Since your code is running successfully,I guess it has something to do with the environment configuration,etc.

superzeroT avatar Jul 05 '23 13:07 superzeroT

Hi @LeonYang95 ,Can I see the shape of your sent_probs and hiddens values. 屏幕截图 2023-07-05 222313

superzeroT avatar Jul 05 '23 14:07 superzeroT

Mine were:

hiddens: [100, 38, 200] sent_probs: [100,38,1]

Your first two shapes seems fine. But the sequence length of the last two shapes is only 1?

LeonYang95 avatar Jul 05 '23 14:07 LeonYang95

Hi you guys. Have you solved this problem? I have a same error when evaluating the testing set. I see that It throws the error at the last batch of testing set.

At the last batch, or in the situation where the sequence length equal to 1, _sent_prob_s has the size batch_size * batch_size rather than batch_size * seqlen ==> So when resizing, it throws this error.

Has there anyone solved this error please?

dino-chiio avatar Aug 16 '23 16:08 dino-chiio

Hi you guy, @LeonYang95. I have just considered that there is an error in module/Attention.py/class LinearAttention , in https://github.com/LeonYang95/PLELog/blob/c8bb56b08fe6368f3b3c71ff88de8a87c48c7607/module/Attention.py#L275

combined_tensors.squeeze(1) will remove the input dimension which has size 1. So, when the input has sequence length 1, it will be removed in dimension 1. When I remove the squeeze function, the code works completely.

Do I misunderstand anything in this situation?
I run the code on the CUDA machine. And another question is that what are exactly the advantages of CPUEmbedding class when I run the code on CUDA?

dino-chiio avatar Aug 17 '23 02:08 dino-chiio

Hi @dino-chiio ,

Your comment about the shape issue is correct, the squeeze will produce this error when sequence length is one. But for other situations, the code works fine. Please consider this error as an "anomaly" prediction.

The CPUEmbedding keeps the embedding weights in CPU instead of GPU to lower the cost of GPU memory. While we were doing this research, our GPUs resources were limited. If you have more advanced GPUs, you can try training the weights along with other parameters, hopefully, you will get a considerable improvement.

LeonYang95 avatar Aug 17 '23 05:08 LeonYang95

Hi @dino-chiio ,

Your comment about the shape issue is correct, the squeeze will produce this error when sequence length is one. But for other situations, the code works fine. Please consider this error as an "anomaly" prediction.

The CPUEmbedding keeps the embedding weights in CPU instead of GPU to lower the cost of GPU memory. While we were doing this research, our GPUs resources were limited. If you have more advanced GPUs, you can try training the weights along with other parameters, hopefully, you will get a considerable improvement.

Hi @LeonYang95 ,

You mentioned that this error is an anomaly prediction. Did you say that any sequence which has the length is one can be indicated as an anomaly? If I do not use squeeze function in this case, and maintain all sequences having a length is 1, is the workflow still normal?

dino-chiio avatar Aug 17 '23 06:08 dino-chiio

I do not recommend to cancel the squeeze operation. The attention I used was learned from other projects, and I am not sure about the results after the cancellation.

I believe regarding log sequences of length one as anomalies is an acceptable solution, since it is possible that those log sequences are actual anomalies or irrelavant to the system running status.

LeonYang95 avatar Aug 17 '23 07:08 LeonYang95