daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-17659 control: Enable setting property value with multiple strings

Open tanabarr opened this issue 7 months ago • 13 comments

Fix issue where a property value containing multiple comma-separated strings cannot be set e.g. self_heal:exclude,rebuild.

Features: control

Steps for the author:

  • [ ] Commit message follows the guidelines.
  • [ ] Appropriate Features or Test-tag pragmas were used.
  • [ ] Appropriate Functional Test Stages were run.
  • [ ] At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • [ ] Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • [ ] Gatekeeper requested (daos-gatekeeper added as a reviewer).

tanabarr avatar Jun 11 '25 12:06 tanabarr

Ticket title is 'Unable to set pool property self_heal to exclude,rebuild' Status is 'In Review' https://daosio.atlassian.net/browse/DAOS-17659

github-actions[bot] avatar Jun 11 '25 12:06 github-actions[bot]

Test stage Build RPM on EL 8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/1/execution/node/330/log

daosbuild3 avatar Jun 11 '25 12:06 daosbuild3

Test stage Build RPM on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/1/execution/node/334/log

daosbuild3 avatar Jun 11 '25 12:06 daosbuild3

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/1/execution/node/327/log

daosbuild3 avatar Jun 11 '25 13:06 daosbuild3

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/1/testReport/

daosbuild3 avatar Jun 11 '25 13:06 daosbuild3

Thanks a lot for the quick responses, everyone.

This patch doesn't feel right---hope I'm not misleading:

  • The , does not require escaping as far as the shell is concerned.
  • The shell removes the escaping before our code sees the arguments. (See my Python experiment below. I also tried C just in case Python is too smart, but I got the same result.)
  • We need to document (in dmg help or man page?) what syntaxes are supported for setting pool property self_heal.
  • The unit test doesn't reflect the real inputs from the shell.
% python -c 'import sys; print(sys.argv)' a b=c d=e,f g='h i' j='k,l' n=o\,p 
['-c', 'a', 'b=c', 'd=e,f', 'g=h i', 'j=k,l', 'n=o,p']

liw avatar Jun 12 '25 05:06 liw

Yeah, after applying this patch, I got the same errors when setting self_heal to exclude,rebuild:

bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:exclude
pool set-prop succeeded
bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:exclude,rebuild
ERROR: dmg: invalid property "rebuild" (must be key:val)
bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:'exclude,rebuild'
ERROR: dmg: invalid property "rebuild" (must be key:val)
bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:exclude\,rebuild
ERROR: dmg: invalid property "rebuild" (must be key:val)

This worked, however:

bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:\'exclude,rebuild\'
pool set-prop succeeded

Sounds like the set-prop syntax design [<key:val[,key:val...]>] doesn't work well with the self_heal value syntax design flag0[,flag1...].

liw avatar Jun 12 '25 05:06 liw

Yeah, after applying this patch, I got the same errors when setting self_heal to exclude,rebuild:

bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:exclude
pool set-prop succeeded
bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:exclude,rebuild
ERROR: dmg: invalid property "rebuild" (must be key:val)
bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:'exclude,rebuild'
ERROR: dmg: invalid property "rebuild" (must be key:val)
bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:exclude\,rebuild
ERROR: dmg: invalid property "rebuild" (must be key:val)

This worked, however:

bash-5.1$ ~/daos/install/bin/dmg -o ~/daos_control.yml -i pool set-prop p0 self_heal:\'exclude,rebuild\'
pool set-prop succeeded

Sounds like the set-prop syntax design [<key:val[,key:val...]>] doesn't work well with the self_heal value syntax design flag0[,flag1...].

If escaping single or double quotes isn't acceptable should the parser check if the option following the comma matches compatible flag name? try match keyval else try match flag? what do you think?

tanabarr avatar Jun 12 '25 09:06 tanabarr

If escaping single or double quotes isn't acceptable should the parser check if the option following the comma matches compatible flag name? try match keyval else try match flag? what do you think?

Could we use some other set of characters to wrap the values (maybe [])? Otherwise, like @liw said, it feels like we made a design mistake somewhere along the line.

I'm not completely against users having to escape the quotes, but they may find it a bit annoying (and hard to figure out what they did wrong if they don't escape them).

kjacque avatar Jun 14 '25 00:06 kjacque

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/2/testReport/

daosbuild3 avatar Jun 14 '25 11:06 daosbuild3

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/2/testReport/

daosbuild3 avatar Jun 14 '25 12:06 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/2/execution/node/1454/log

daosbuild3 avatar Jun 15 '25 00:06 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/2/execution/node/1544/log

daosbuild3 avatar Jun 15 '25 01:06 daosbuild3

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/3/testReport/

daosbuild3 avatar Jul 03 '25 12:07 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/3/execution/node/1479/log

daosbuild3 avatar Jul 03 '25 13:07 daosbuild3

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/4/testReport/

daosbuild3 avatar Jul 03 '25 16:07 daosbuild3

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/4/execution/node/1644/log

daosbuild3 avatar Jul 06 '25 23:07 daosbuild3

I think this is ready to land, it's passed all CI hardware test stages after multiple run from stage tries. NLT failures look unrelated and approvals have been done. Requesting gatekeeper.

tanabarr avatar Jul 14 '25 14:07 tanabarr

@daos-stack/daos-gatekeeper can this be landed given we've got all green on CI and NLT is unrelated? otherwise please let me know if I need to rerun. TIA

tanabarr avatar Jul 15 '25 16:07 tanabarr

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/9/testReport/

daosbuild3 avatar Jul 16 '25 14:07 daosbuild3

Test stage Build RPM on EL 8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/12/execution/node/326/log

daosbuild3 avatar Jul 18 '25 10:07 daosbuild3

Test stage Build RPM on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/12/execution/node/329/log

daosbuild3 avatar Jul 18 '25 11:07 daosbuild3

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/12/execution/node/342/log

daosbuild3 avatar Jul 18 '25 11:07 daosbuild3

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/14/testReport/

daosbuild3 avatar Jul 23 '25 09:07 daosbuild3

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/15/testReport/

daosbuild3 avatar Jul 23 '25 14:07 daosbuild3

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/16/testReport/

daosbuild3 avatar Jul 24 '25 13:07 daosbuild3

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16503/17/execution/node/1510/log

daosbuild3 avatar Jul 26 '25 02:07 daosbuild3

Test stage Functional Hardware Large MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-16503/17/testReport/

daosbuild3 avatar Jul 28 '25 10:07 daosbuild3

Looks like all stages actually completed. Failures are known issues:

  • EcIor/ior_smoke.py: https://daosio.atlassian.net/browse/DAOS-17842
  • DaosCoreTest/test_daos_rebuild_ec: https://daosio.atlassian.net/browse/DAOS-17091

kjacque avatar Jul 30 '25 20:07 kjacque

CI run 18 passed all, removing force landing label

tanabarr avatar Jul 31 '25 12:07 tanabarr