badfish icon indicating copy to clipboard operation
badfish copied to clipboard

[BUG] Issue with batch processing host list from the container

Open sanjaychari opened this issue 1 year ago • 1 comments

Your System Details

  • Operating System: RHEL 7.8
  • Target System Type: Dell
  • Podman version 1.6.4

Describe the bug This is an issue that occurs intermittently on Dell hosts. While trying to run actions related to boot order in a batch through the --host-list parameter, the action fails on some hosts.

[sanjay@perfc-360g8-04 jetpack]$ podman run -it -v /home/sanjay/jetpack/badfish:/dell --rm quay.io/quads/badfish --host-list /dell/dell-hosts -u quads -p <password> -i config/idrac_interfaces.yml --check-boot
[mgmt-e24-h25-740xd] - INFO     - Executing actions on host: mgmt-e24-h25-740xd.example.com
[mgmt-e24-h25-740xd] - WARNING  - Current boot order is set to: director.
[mgmt-e24-h25-740xd] - INFO     - ************************************************
[mgmt-e24-h27-740xd] - INFO     - Executing actions on host: mgmt-e24-h27-740xd.example.com
[mgmt-e24-h27-740xd] - WARNING  - Current boot order is set to: director.
[mgmt-e24-h27-740xd] - INFO     - ************************************************
[mgmt-e24-h29-740xd] - ERROR    - Failed to communicate with mgmt-e24-h29-740xd.example.com
[mgmt-e24-h29-740xd] - INFO     - ************************************************
[mgmt-e24-h31-740xd] - ERROR    - Failed to communicate with mgmt-e24-h31-740xd.example.com
[mgmt-e24-h31-740xd] - INFO     - ************************************************
[mgmt-e24-h33-740xd] - INFO     - Executing actions on host: mgmt-e24-h33-740xd.example.com
[mgmt-e24-h33-740xd] - WARNING  - Current boot order is set to: director.
[mgmt-e24-h33-740xd] - INFO     - ************************************************
[src.badfish.helpers.logger] - INFO     - RESULTS:
[src.badfish.helpers.logger] - INFO     - mgmt-e24-h25-740xd.alias.bos.scalelab.redhat.com: SUCCESSFUL
[src.badfish.helpers.logger] - INFO     - mgmt-e24-h27-740xd.alias.bos.scalelab.redhat.com: SUCCESSFUL
[src.badfish.helpers.logger] - INFO     - mgmt-e24-h29-740xd.alias.bos.scalelab.redhat.com: FAILED
[src.badfish.helpers.logger] - INFO     - mgmt-e24-h31-740xd.alias.bos.scalelab.redhat.com: FAILED
[src.badfish.helpers.logger] - INFO     - mgmt-e24-h33-740xd.alias.bos.scalelab.redhat.com: SUCCESSFUL

However, if the same action is run individually through the badfish python script on the failed hosts, it is successful.

(venv) (base) [schari@schari badfish]$ python3 src/badfish/badfish.py -H mgmt-e24-h31-740xd.example.com -u quads -p <password> -i config/idrac_interfaces.yml --check-boot
- WARNING  - Current boot order is set to: director.
(venv) (base) [schari@schari badfish]$ python3 src/badfish/badfish.py -H mgmt-e24-h31-740xd.example.com -u quads -p <password> -i config/idrac_interfaces.yml --check-boot
- WARNING  - Current boot order is set to: director.

After some time, the action runs successfully through batch processing on the container too. However, this is after a long time from when the action is successful on individual hosts through the python script.

Expected Behavior Batch processing of hosts through the badfish container should return the same results for all hosts at the same time as individual processing of the same hosts through the badfish python script.

sanjaychari avatar Oct 18 '22 05:10 sanjaychari

The IDRAC is becoming unresponsive when containerized badfish is performing bulk actions, due to which I think it fails saying that "failed to communicate with host"

rajeshP524 avatar Oct 21 '22 06:10 rajeshP524