PLELog Training process error

The training process encountered the following problems bidirectional set as False.Another problem arose. I have removed the back part of the multiplication in the image below.Can be trained but the result is wrong. Below is the contents of my data file I can't solve the problem yet.I hope I can get some help to sovle this problem.Thank you very much!

Jul 02 '23 08:07 superzeroT

Can you check which one is 200d, and which one is 100d in hiddens * sent_probs? This could help clearify this issue.

Jul 02 '23 12:07 LeonYang95

Can you check which one is 200d, and which one is 100d in hiddens * sent_probs? This could help clearify this issue.

hiddens is 200d,sent_probs is 100d.

Jul 02 '23 13:07 superzeroT

Sorry, I failed to reproduce this error. However, here's a tip that may help:

The shape of sent_prob should be batch_size * seq_len after the attention mechanism. And become batch_size * seq_len * 1 after the view operation. Therefore, I am not sure which part goes wrong. As shown in your output, 100 could be the batch_size.

Here I attach a screenshot of my runtime outputs with shapes, I hope this can help you with debugging.

Jul 03 '23 03:07 LeonYang95

Jul 03 '23 03:07 superzeroT

What are the shapes after the view operation?

Jul 03 '23 03:07 LeonYang95

Jul 03 '23 03:07 superzeroT

So it seems fine?

sent_probs can be regarded as the attention score of the hidden states for each log event in the log sequence. The multiplication between hidden_states and sent_probs is actually an averaged summation of hidden states so that it gives a final representation for each log sequence.

Jul 03 '23 03:07 LeonYang95

So it seems fine?

sent_probs can be regarded as the attention score of the hidden states for each log event in the log sequence. The multiplication between hidden_states and sent_probs is actually an averaged summation of hidden states so that it gives a final representation for each log sequence.

Thank you very much for your help.I'll keep looking for a solution.

Jul 03 '23 03:07 superzeroT

Hi @superzeroT , may I ask if the issue is still unresolved? And, as mentioned in your screenshot, "此处有相应的修改“, what were those exactly?

Jul 05 '23 09:07 LeonYang95

Hi @LeonYang95 ,I haven't solved the problem yet.I tried to unify the dimensions but it didn't work.Don't worry about the note I added.Since your code is running successfully,I guess it has something to do with the environment configuration,etc.

Jul 05 '23 13:07 superzeroT

Hi @LeonYang95 ,Can I see the shape of your sent_probs and hiddens values. 屏幕截图 2023-07-05 222313

Jul 05 '23 14:07 superzeroT

Mine were:

hiddens: [100, 38, 200] sent_probs: [100,38,1]

Your first two shapes seems fine. But the sequence length of the last two shapes is only 1?

Jul 05 '23 14:07 LeonYang95

Hi you guys. Have you solved this problem? I have a same error when evaluating the testing set. I see that It throws the error at the last batch of testing set.

At the last batch, or in the situation where the sequence length equal to 1, _sent_prob_s has the size batch_size * batch_size rather than batch_size * seqlen ==> So when resizing, it throws this error.

Has there anyone solved this error please?

Aug 16 '23 16:08 dino-chiio

Hi you guy, @LeonYang95. I have just considered that there is an error in module/Attention.py/class LinearAttention , in https://github.com/LeonYang95/PLELog/blob/c8bb56b08fe6368f3b3c71ff88de8a87c48c7607/module/Attention.py#L275

combined_tensors.squeeze(1) will remove the input dimension which has size 1. So, when the input has sequence length 1, it will be removed in dimension 1. When I remove the squeeze function, the code works completely.

Do I misunderstand anything in this situation?
I run the code on the CUDA machine. And another question is that what are exactly the advantages of CPUEmbedding class when I run the code on CUDA?

Aug 17 '23 02:08 dino-chiio

Hi @dino-chiio ,

Your comment about the shape issue is correct, the squeeze will produce this error when sequence length is one. But for other situations, the code works fine. Please consider this error as an "anomaly" prediction.

The CPUEmbedding keeps the embedding weights in CPU instead of GPU to lower the cost of GPU memory. While we were doing this research, our GPUs resources were limited. If you have more advanced GPUs, you can try training the weights along with other parameters, hopefully, you will get a considerable improvement.

Aug 17 '23 05:08 LeonYang95

Hi @dino-chiio ,

Your comment about the shape issue is correct, the squeeze will produce this error when sequence length is one. But for other situations, the code works fine. Please consider this error as an "anomaly" prediction.

The CPUEmbedding keeps the embedding weights in CPU instead of GPU to lower the cost of GPU memory. While we were doing this research, our GPUs resources were limited. If you have more advanced GPUs, you can try training the weights along with other parameters, hopefully, you will get a considerable improvement.

Hi @LeonYang95 ,

You mentioned that this error is an anomaly prediction. Did you say that any sequence which has the length is one can be indicated as an anomaly? If I do not use squeeze function in this case, and maintain all sequences having a length is 1, is the workflow still normal?

Aug 17 '23 06:08 dino-chiio

I do not recommend to cancel the squeeze operation. The attention I used was learned from other projects, and I am not sure about the results after the cancellation.

I believe regarding log sequences of length one as anomalies is an acceptable solution, since it is possible that those log sequences are actual anomalies or irrelavant to the system running status.

Aug 17 '23 07:08 LeonYang95

PLELog PLELog copied to clipboard

Training process error

PLELog
PLELog copied to clipboard