semaphore icon indicating copy to clipboard operation
semaphore copied to clipboard

Mail alerts are sent only to admin

Open zapfdk opened this issue 3 years ago • 7 comments

Hi, I am running the newest version with the docker-compose installation. I am trying to get mail alerts to work, but they only get sent to the admin mail address.

It's working in my local installation, but not on the server. I can't find a configuration difference between the two. For both mail alerts activated in the config file and the project settings. Alerts are activated for several users.

Any idea?

zapfdk avatar Oct 06 '22 14:10 zapfdk

It's working in my local installation, but not on the server.

Local installation is docker too? Please check task output lines which looks like Sending email to ... (at the end of output)

fiftin avatar Oct 06 '22 14:10 fiftin

Yes, local installation was also done with docker. Both are on v2.8.68.

Task output only states the admin mail in the output on the server. So it doesn't even try to send the email, not a mailserver issue. On my local installation it correctly lists the mail addresses.

zapfdk avatar Oct 07 '22 07:10 zapfdk

What task prints at the end? You should see lines Sending email to <your email>.

fiftin avatar Oct 07 '22 08:10 fiftin

Yes I see those lines, but only for one mail on the server, but for all users on my local installation.

zapfdk avatar Oct 07 '22 08:10 zapfdk

Please check following checkboxes on Users page:

image

fiftin avatar Oct 07 '22 18:10 fiftin

Yes, those are checked, too.

zapfdk avatar Oct 10 '22 07:10 zapfdk

I have the same issue with the latest stable Semaphore version at the time of writing (image: semaphoreui/semaphore:v2.8.89).

Some info here (emails are redacted):

  • I have 3 users:
    • admin (created during the first run in docker) with an email [email protected] (this email is invalid/not working)
    • my user connected via LDAP (i.e. external with a valid/working email)
    • a temp non-external user with a valid/working email (created this one just to check if the issue depends on the external flag)
  • All 3 users have "Alert" checkbox ticked on the users interface.
  • All 3 users have "Admin" checkbox ticked on the users interface.
  • The project itself has "Allow alerts" checkbox ticked in settings.
  • All 3 users are admins of a team owning a project in which I run an intentionally erroneous playbook
  • When I run the playbook, I get two messages:

Now, I don't expect that the email for the admin user will pass through as its address is in fact invalid. What I expect is Semaphore to try to send emails to all three users.

My first assumption was that this for loop runs in a synchronous manner, so after it failed to send the email to the 1st (invalid) user, it panics, leaving everyone else in vain. The solution would probably be to run the for loop body in goroutine and (maybe) to log the result as error instead of panic.

However, to prove a concept I've disabled alerts to the admin user (the invalid one). If I was right Semaphore would try to send a message to the next user in chain, which is a valid email. Instead, when I do this, I see NO line "Sending email to ..." in the end of the playbook run. So, based on the sendMailAlert() code, it seems that either:

  • t.users only have one item there (but it's not clear how it's possible)
  • userObj.Alert is false (actually, it's not clear too)

So in the end I turned logging in PG on for all queries (ALTER DATABASE semaphore SET log_statement = 'all';) and reviewed what Semaphore actually does with PG during the execution:

Before an actual playbook run I see a call to a user table many times:

postgres_1   | 2023-03-22 11:49:31.423 UTC [33] LOG:  execute <unnamed>: select * from "user" where id=$1
postgres_1   | 2023-03-22 11:49:31.423 UTC [33] DETAIL:  parameters: $1 = '3'

note the $1 = 3 which is a proper ID of user I'm running a playbook from

After some preparations Semaphore runs the playbook, storing the ansible output into the task__output – it's the vast majority of the log and I just ignore it.

Finally, the playbook throws an error, and I believe here's our relevant part:

postgres_1   | 2023-03-22 11:50:45.497 UTC [33] DETAIL:  parameters: $1 = '47', $2 = 'Running playbook failed: exit status 2', $3 = '2023-03-22 11:50:45.4955'
postgres_1   | 2023-03-22 11:50:45.499 UTC [46] LOG:  execute <unnamed>: update task set status=$1, start=$2, "end"=$3 where id=$4
postgres_1   | 2023-03-22 11:50:45.499 UTC [46] DETAIL:  parameters: $1 = 'error', $2 = '2023-03-22 11:49:55.076747', $3 = NULL, $4 = '47'
postgres_1   | 2023-03-22 11:50:45.505 UTC [46] LOG:  execute <unnamed>: select * from "user" where id=$1
postgres_1   | 2023-03-22 11:50:45.505 UTC [46] DETAIL:  parameters: $1 = '1'
semaphore_1  | time="2023-03-22T11:50:45Z" level=info msg="Stopped running TaskRunner 47"
semaphore_1  | time="2023-03-22T11:50:45Z" level=info msg="Release resource locker with TaskRunner 47"

Note that the only query to user table is executed with $1 = '1', which is my first admin user with Alert = false. No attempts to query all users from the team or even the current user.

In the end this makes me think that I'm right here: "t.users only have a one item there". Filling the t.users slice is happening here and the real SQL query happens here, so I found a corresponding query in a log:

postgres_1   | 2023-03-22 11:49:45.608 UTC [33] LOG:  execute <unnamed>: SELECT u.*, pu.admin FROM project__user as pu LEFT JOIN "user" as u on pu.user_id=u.id WHERE pu.project_id=$1 ORDER BY u.name ASC
postgres_1   | 2023-03-22 11:49:45.608 UTC [33] DETAIL:  parameters: $1 = '1'

I've executed this query manually against the real DB and I've got three records as expected – so I guess after all I've done it's a stalemate. It looks like it should work as expected, but for some reason it doesn't. It's definetely needed to debug the code itself, not logs. I can't run the project in dev mode right now to actually debug it, but I will try to investigate it further later. Also I hope that I could make authors to have some insights on what's happening.


Additionally, Semaphore server logs are kinda vague, it seems that it panics during the attempt to log an error, see a chunk of a log below:

semaphore_1  | time="2023-03-22T10:37:14Z" level=error msg=EOF
semaphore_1  | time="2023-03-22T10:37:14Z" level=panic msg=EOF error="Can't send email!"
semaphore_1  | time="2023-03-22T10:37:14Z" level=info msg="Stopped running TaskRunner 44"
semaphore_1  | time="2023-03-22T10:37:14Z" level=info msg="Release resource locker with TaskRunner 44"
semaphore_1  | panic: (*logrus.Entry) 0xc000efe0f0
semaphore_1  |
semaphore_1  | goroutine 581 [running]:
semaphore_1  | github.com/Sirupsen/logrus.Entry.log({0xc0000742a0, 0xc001dde240, {0x0, 0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0}, 0x0, ...)
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/vendor/github.com/Sirupsen/logrus/entry.go:128 +0x56c
semaphore_1  | github.com/Sirupsen/logrus.(*Entry).Panic(0xc000efe0a0, {0xc001061488, 0x1, 0x1})
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/vendor/github.com/Sirupsen/logrus/entry.go:173 +0xfb
semaphore_1  | github.com/ansible-semaphore/semaphore/util.LogPanicWithFields({0x21b2bc0, 0xc000068120}, 0xb97ebf?)
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/util/errorLogging.go:39 +0x8d
semaphore_1  | github.com/ansible-semaphore/semaphore/services/tasks.(*TaskRunner).panicOnError(0xc000094580?, {0x21b2bc0, 0xc000068120}, {0xba35a6, 0x11})
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/services/tasks/logging.go:78 +0x14b
semaphore_1  | github.com/ansible-semaphore/semaphore/services/tasks.(*TaskRunner).sendMailAlert(0xc001fceb00)
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/services/tasks/alert.go:77 +0x4c5
semaphore_1  | github.com/ansible-semaphore/semaphore/services/tasks.(*TaskRunner).setStatus(0xc001fceb00, {0xb97ebf?, 0x26?})
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/services/tasks/runner.go:85 +0xed
semaphore_1  | github.com/ansible-semaphore/semaphore/services/tasks.(*TaskRunner).fail(...)
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/services/tasks/runner.go:118
semaphore_1  | github.com/ansible-semaphore/semaphore/services/tasks.(*TaskRunner).run(0xc001fceb00)
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/services/tasks/runner.go:299 +0x698
semaphore_1  | created by github.com/ansible-semaphore/semaphore/services/tasks.(*TaskPool).Run
semaphore_1  | 	/go/src/github.com/ansible-semaphore/semaphore/services/tasks/pool.go:158 +0x4f6

tommiv avatar Mar 22 '23 12:03 tommiv

After merging My PR https://github.com/ansible-semaphore/semaphore/pull/1788 have been merged which fixes this issue. the mail notification is working fine if the config.json includes all mentioned options and after enabling alerts for your user.

tboerger avatar Mar 04 '24 09:03 tboerger