ciso-assistant-community icon indicating copy to clipboard operation
ciso-assistant-community copied to clipboard

Intermittent cases of getaddrinfo failing between frontend and backend

Open LennertVA opened this issue 9 months ago • 9 comments

Describe the bug While playing around in Community Edition, using the docker images provided, error 500 happens very regularly - roughly one in 3 to 4 actions triggers one. According to debug output it is due to getaddrinfo sometimes failing. In every case, simply refreshing the interface once or twice makes it go away.

To Reproduce There are no steps needed to reproduce. It happens all over the interface, for any action that involves calling the backend, in roughly 25% of the cases.

Expected behavior No errors 500.

Screenshots Screenshots don't say much except "Error 500 - Internal Error", but whenever it happens this is the cause in the container logs:

frontend    | TypeError: fetch failed
frontend    |     at node:internal/deps/undici/undici:12500:13
frontend    |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
frontend    |   [cause]: Error: getaddrinfo ENOTFOUND backend
frontend    |       at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:118:26) {
frontend    |     errno: -3008,
frontend    |     code: 'ENOTFOUND',
frontend    |     syscall: 'getaddrinfo',
frontend    |     hostname: 'backend'
frontend    |   }
frontend    | }

Environment (please complete the following information):

  • Server OS: official Docker container images imported into podman on RHEL8.9 x86_64
  • Client Browser: Firefox 125.0.2
  • CISO Assistant version: v1.3.5 build 5baf1fc1

Additional context The server OS runs SELinux in full enforcing mode. It took quite some relabeling of files and loading of custom policies to get it to run, but now that it runs it appears not to be involved in this (no audit logs of anything being blocked). Still worth mentioning perhaps.

It is particularly odd that it fails sometimes. And if it fails, a refresh usually does the trick. Which means it is not simply a case of something being broken or blocked, since it does work "usually". Is there a very short timeout configured somewhere for the call? The host server does run a noticeable load.

LennertVA avatar May 24 '24 13:05 LennertVA