linstor-server icon indicating copy to clipboard operation
linstor-server copied to clipboard

Error reports rotation question

Open kvaps opened this issue 3 years ago • 5 comments

Hi, not sure if this expected behavior, so better to report this.

Due to bug of containerd some of our nodes were overfilled by the unterminated processes:

fork failed: Resource temporarily unavailable

linstor-satellite also generated many similar error-reports in the logs

# du -hs /var/log/linstor-satellite/
3.5G	/var/log/linstor-satellite/

one of them:

ERROR REPORT 60926587-31C54-509538

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Satellite
Version:                            1.12.2
Build ID:                           72244c7d40ba34808024a2c75da1d736dfd2e54e
Build time:                         2021-05-04T12:53:07+00:00
Error time:                         2021-07-03 06:20:52
Node:                               m5c6
Peer:                               10.36.128.186:57816

============================================================

Reported error:
===============

Category:                           Exception
Class name:                         SSLException
Class canonical name:               javax.net.ssl.SSLException
Generated at:                       Method 'createSSLException', Source file 'Alert.java', Line #133

Error message:                      closing inbound before receiving peer's close_notify

Error context:
    I/O exception while attempting to receive data from the peer

Call backtrace:

    Method                                   Native Class:Line number
    createSSLException                       N      sun.security.ssl.Alert:133
    createSSLException                       N      sun.security.ssl.Alert:117
    fatal                                    N      sun.security.ssl.TransportContext:336
    fatal                                    N      sun.security.ssl.TransportContext:292
    fatal                                    N      sun.security.ssl.TransportContext:283
    closeInbound                             N      sun.security.ssl.SSLEngineImpl:733
    doHandshake                              N      com.linbit.linstor.netcom.ssl.SslTcpConnectorHandshaker:118
    read                                     N      com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:162
    run                                      N      com.linbit.linstor.netcom.TcpConnectorService:543
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.

kvaps avatar Jul 19 '21 11:07 kvaps

Ah, my bad, this issue is not related to the today's incident, this is just old bug reports. I need to implement some log rotation to the kube-linstor project.

kvaps avatar Jul 19 '21 13:07 kvaps

Not sure, how can I rotate the error-report database on the satellites?

# du -hs /var/log/linstor-satellite/error-report.mv.db
1.4G	/var/log/linstor-satellite/error-report.mv.db

kvaps avatar Jul 19 '21 13:07 kvaps

We currently store the error reports still as text files and within a DB. But if you want to also have the DB as backup, you can simply copy and compress it to some archive and then use linstor error-reports delete with its various parameters to get rid of old error-report entries.

rp- avatar Jul 20 '21 05:07 rp-

@rp- thank you for the information! Is there any opportunity to list and purge error reports on the satellites the same way as on controller?

kvaps avatar Jul 20 '21 06:07 kvaps

@rp- thank you for the information! Is there any opportunity to list and purge error reports on the satellites the same way as on controller?

There is no tool yet for this, but it should be possible with the h2 binary tools and simply executing SQL statements. https://www.h2database.com/html/download.html

rp- avatar Jul 21 '21 10:07 rp-