ara icon indicating copy to clipboard operation
ara copied to clipboard

Labels passed as extra-vars causes ara errors and recording failures

Open zswanson opened this issue 4 years ago • 3 comments

What component is this about ?

API client

What is your ARA installation like ?

ara-api 1.3.2, running on an ubuntu AWS instance, installed from source. The ara-api is configured for an external postgres backend. No reverse proxy, it is accepting traffic directly over port 8000. ara client 1.3.2, running on centos 7 installed from pypi. The ansible plugin portion is configured by environment variables, the ara client configuration is static configured in the ansible.cfg to the http type, and the url of my ara-api server.

What is happening ?

Playbook executed during a packer AMI build on AWS was passing the ara_ansible_playbook_labels as an extra-var through a build script. The playbook execution itself was unaffected, but ara would report (via std-err output) errors during execution. Also, the record in ara-api appears incomplete and only shows the playbook name and start time; the api still thinks the playbook is in-progress. No tasks recorded.

2020-02-29T21:05:40-05:00:     amazon-ebs: PLAY [all] *********************************************************************
2020-02-29T21:05:40-05:00: ==> amazon-ebs: Failed to patch on /api/v1/playbooks/4: {'labels': 'packer'}
2020-02-29T21:05:40-05:00: ==> amazon-ebs: [WARNING]: Failure using method (v2_playbook_on_play_start) in callback plugin
2020-02-29T21:05:40-05:00: ==> amazon-ebs: (<ansible.plugins.callback.ara_default.CallbackModule object at
2020-02-29T21:05:40-05:00: ==> amazon-ebs: 0x7f15ec208438>): 'id'
2020-02-29T21:05:40-05:00: ==> amazon-ebs: [WARNING]: Failure using method (v2_playbook_on_task_start) in callback plugin
2020-02-29T21:05:40-05:00:     amazon-ebs:
2020-02-29T21:05:40-05:00: ==> amazon-ebs: (<ansible.plugins.callback.ara_default.CallbackModule object at
2020-02-29T21:05:40-05:00: ==> amazon-ebs: 0x7f15ec208438>): 'NoneType' object is not subscriptable
2020-02-29T21:05:40-05:00:     amazon-ebs: TASK [Gathering Facts] *********************************************************
2020-02-29T21:05:40-05:00:     amazon-ebs: Sunday 01 March 2020  02:05:40 +0000 (0:00:00.440)       0:00:00.440 **********
2020-02-29T21:05:42-05:00: ==> amazon-ebs: [WARNING]: Failure using method (v2_runner_on_ok) in callback plugin
2020-02-29T21:05:42-05:00:     amazon-ebs: ok: [localhost]
2020-02-29T21:05:42-05:00: ==> amazon-ebs: (<ansible.plugins.callback.ara_default.CallbackModule object at
2020-02-29T21:05:42-05:00: ==> amazon-ebs: 0x7f15ec208438>): 'id'
2020-02-29T21:05:42-05:00:     amazon-ebs:
2020-02-29T21:05:42-05:00:     amazon-ebs: TASK [ntp : install ntp package] ***********************************************
2020-02-29T21:05:42-05:00:     amazon-ebs: Sunday 01 March 2020  02:05:42 +0000 (0:00:01.628)       0:00:02.068 **********
2020-02-29T21:05:42-05:00: ==> amazon-ebs: [WARNING]: Failure using method (v2_playbook_on_task_start) in callback plugin
2020-02-29T21:05:42-05:00: ==> amazon-ebs: (<ansible.plugins.callback.ara_default.CallbackModule object at
2020-02-29T21:05:42-05:00: ==> amazon-ebs: 0x7f15ec208438>): 'id'

Execution of the playbook:

ansible-playbook ${REMOTE_ANSIBLE_DIR}/test.yml \
    -c local \
    -e "ara_playbook_name=build-ara-test \
        ara_playbook_labels=packer" \
    -i localhost, \
    -v;

This reliably occurred over and over on multiple instance builds, changing various factors about the environment didn't help. I eventually noticed that the error about being unable to patch the playbook api came from a function doing some work with labels. I removed the extra-vars for the label, and the playbook now runs w/o errors and records appear in ara-api correctly.

What should be happening ?

Tasks from the ara client should be reported to the ara-api and recorded in the db.

zswanson avatar Mar 01 '20 18:03 zswanson

I can replicate this with 1.3.2 but not with master; I suspect it was fixed by https://github.com/ansible-community/ara/commit/7388229361022b2805fa0e9a636b827ed6583ca9

https://github.com/ansible-community/ara/blob/d9df520e02f84a6aeb95ec643c7009cbe48c4efa/ara/plugins/callback/ara_default.py#L338-L346 could do with being made more fault-tolerant; losing the playbook information when PATCH fails is not ideal.

flowerysong avatar Mar 02 '20 22:03 flowerysong

Hi @zswanson and thanks for taking the time to create an issue. It would be helpful to see if you can reproduce the issue with master, otherwise I will eventually get around to it.

@flowerysong I agree that the current implementation is overly optimistic and failure tolerance needs to be improved across the board.

Some errors are more "fatal" than others, but this has a lot to do with the synchronous nature of the callback and needing to tie data back to their parent. For example, to send a result successfully, we must first have:

  • created the task successfully (to have the task ID)
  • created the host successfully (to have the host ID)
  • created the play successfully (to have the play ID)
  • created the playbook successfully (to have the playbook ID)

If the task POST failed for some reason, then the failure will cascade and will cause the results for that task to fail being saved as well.

That said, I would not consider a failure to patch a playbook to add labels to be fatal. I mean, it would not be a good thing to have missing labels but it shouldn't cause everything else to fail.

dmsimard avatar Apr 13 '20 19:04 dmsimard

I haven't looked at this yet but it didn't warrant blocking the release of 1.4, we can fix it in an upcoming dot release.

dmsimard avatar Apr 15 '20 20:04 dmsimard