PLELog
PLELog copied to clipboard
Training process error
The training process encountered the following problems
bidirectional set as False.Another problem arose.
I have removed the back part of the multiplication in the image below.Can be trained but the result is wrong.
Below is the contents of my data file
I can't solve the problem yet.I hope I can get some help to sovle this problem.Thank you very much!
Can you check which one is 200d, and which one is 100d in hiddens * sent_probs
? This could help clearify this issue.
Can you check which one is 200d, and which one is 100d in
hiddens * sent_probs
? This could help clearify this issue.
hiddens is 200d,sent_probs is 100d.
Sorry, I failed to reproduce this error. However, here's a tip that may help:
The shape of sent_prob
should be batch_size * seq_len after the attention mechanism. And become batch_size * seq_len * 1 after the view operation. Therefore, I am not sure which part goes wrong. As shown in your output, 100 could be the batch_size.
Here I attach a screenshot of my runtime outputs with shapes, I hope this can help you with debugging.
What are the shapes after the view operation?
So it seems fine?
sent_probs can be regarded as the attention score of the hidden states for each log event in the log sequence. The multiplication between hidden_states and sent_probs is actually an averaged summation of hidden states so that it gives a final representation for each log sequence.
So it seems fine?
sent_probs can be regarded as the attention score of the hidden states for each log event in the log sequence. The multiplication between hidden_states and sent_probs is actually an averaged summation of hidden states so that it gives a final representation for each log sequence.
Thank you very much for your help.I'll keep looking for a solution.
Hi @superzeroT , may I ask if the issue is still unresolved? And, as mentioned in your screenshot, "此处有相应的修改“, what were those exactly?
Hi @LeonYang95 ,I haven't solved the problem yet.I tried to unify the dimensions but it didn't work.Don't worry about the note I added.Since your code is running successfully,I guess it has something to do with the environment configuration,etc.
Hi @LeonYang95 ,Can I see the shape of your sent_probs and hiddens values.
Mine were:
hiddens: [100, 38, 200] sent_probs: [100,38,1]
Your first two shapes seems fine. But the sequence length of the last two shapes is only 1?
Hi you guys. Have you solved this problem? I have a same error when evaluating the testing set. I see that It throws the error at the last batch of testing set.
At the last batch, or in the situation where the sequence length equal to 1, _sent_prob_s has the size batch_size * batch_size rather than batch_size * seqlen ==> So when resizing, it throws this error.
Has there anyone solved this error please?
Hi you guy, @LeonYang95. I have just considered that there is an error in module/Attention.py/class LinearAttention
, in https://github.com/LeonYang95/PLELog/blob/c8bb56b08fe6368f3b3c71ff88de8a87c48c7607/module/Attention.py#L275
combined_tensors.squeeze(1)
will remove the input dimension which has size 1. So, when the input has sequence length 1, it will be removed in dimension 1. When I remove the squeeze function, the code works completely.
Do I misunderstand anything in this situation?
I run the code on the CUDA machine. And another question is that what are exactly the advantages of CPUEmbedding
class when I run the code on CUDA?
Hi @dino-chiio ,
Your comment about the shape issue is correct, the squeeze will produce this error when sequence length is one. But for other situations, the code works fine. Please consider this error as an "anomaly" prediction.
The CPUEmbedding keeps the embedding weights in CPU instead of GPU to lower the cost of GPU memory. While we were doing this research, our GPUs resources were limited. If you have more advanced GPUs, you can try training the weights along with other parameters, hopefully, you will get a considerable improvement.
Hi @dino-chiio ,
Your comment about the shape issue is correct, the squeeze will produce this error when sequence length is one. But for other situations, the code works fine. Please consider this error as an "anomaly" prediction.
The CPUEmbedding keeps the embedding weights in CPU instead of GPU to lower the cost of GPU memory. While we were doing this research, our GPUs resources were limited. If you have more advanced GPUs, you can try training the weights along with other parameters, hopefully, you will get a considerable improvement.
Hi @LeonYang95 ,
You mentioned that this error is an anomaly prediction. Did you say that any sequence which has the length is one can be indicated as an anomaly? If I do not use squeeze
function in this case, and maintain all sequences having a length is 1, is the workflow still normal?
I do not recommend to cancel the squeeze operation. The attention I used was learned from other projects, and I am not sure about the results after the cancellation.
I believe regarding log sequences of length one as anomalies is an acceptable solution, since it is possible that those log sequences are actual anomalies or irrelavant to the system running status.