clustershell icon indicating copy to clipboard operation
clustershell copied to clipboard

clush doesn't see groups.d definitions?

Open skwde opened this issue 1 year ago • 6 comments

I installed clush via conda in a environment. All config files are under <conda env location>/clush/etc/clustershell

In my groups.conf I have

[Main]
default: roles
confdir: /etc/clustershell/groups.conf.d $CFGDIR/groups.conf.d
autodir: /etc/clustershell/groups.d $CFGDIR/groups.d

Next I have groups.d/cluster.yaml where I have my node definitions. The syntax of the file is fine because running nodeset -LL shows all my definitions.

If I run

clush -a date

I expect to get the date from all machines.

Instead I get

Usage: clush [options] command

clush: error: No node to run on.

so what am I missing here?

skwde avatar Jan 30 '24 10:01 skwde

Thanks for reporting that issue.

Could you run clush with -d and -v option to collect more debugging logs?

Also, nodeset -f -a should report the node list clush will be using when using -a. Is that nodelist the one you expect ?

Aurélien

Le 2024-01-30 11:33, Stefan Weber a écrit :

I installed clush via conda in a environment. All config files are under /clush/etc/clustershell

In my groups.conf I have

[Main] default: roles confdir: /etc/clustershell/groups.conf.d $CFGDIR/groups.conf.d autodir: /etc/clustershell/groups.d $CFGDIR/groups.d

Next I have groups.d/cluster.yaml where I have my node definitions. The syntax of the file is fine because running nodeset -LL shows all my definitions.

If I run

clush -a date

I expect to get the date from all machines.

Instead I get

Usage: clush [options] command

clush: error: No node to run on.

so what am I missing here?

-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Links:

[1] https://github.com/cea-hpc/clustershell/issues/552 [2] https://github.com/notifications/unsubscribe-auth/AALO4OSIENST3363LTG5GZDYRDEAJAVCNFSM6AAAAABCQ7YAZWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYDONBZGM4TKMQ --=_ff31ca09d337529e7bcb747f1beb6874 Content-Type: multipart/related; boundary="=_af15c636883a006c0b7ee53cd2858fd0"

--=_af15c636883a006c0b7ee53cd2858fd0 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8

Thanks for reporting that issue.

Could you run `clush` with -d and -v option to collect more debugging lo= gs?

Also, nodeset -f -a should report the node list clush will be using when= using -a. Is that nodelist the one you expect ?


Aurélien

Le 2024-01-30 11:33, Stefan Weber a écrit&nbsp= ;:


I installed clush via conda= code> in a environment. All config files are under <conda env location>/clush/etc/clustershell

In my groups.conf I ha= ve

[M=
ain]
default: roles
confdir: /etc/clustershell/groups.conf.d $CFGDIR/groups.conf.d
autodir: /etc/clustershell/groups.d $CFGDIR/groups.d

Next I have groups.d/cluster.= yaml where I have my node definitions.
The syntax of the file i= s fine because running nodeset -LL sho= ws all my definitions.

If I run

clush -a date

I expect to get the date from all machines.

Instead I get

Usage: clush [options] command

clush: error: No node to run on.

so what am I missing here?


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscr= ibed to this thread.


--=_af15c636883a006c0b7ee53cd2858fd0 Content-Transfer-Encoding: base64 Content-ID: @.***> Content-Type: image/gif; name=blocked.gif Content-Disposition: inline; filename=blocked.gif; size=118

R0lGODlhZAAyAIAAAPrOzgAAACH5BAAAAAAALAAAAABkADIAAAJNhI+py+0Po5y02ouz3rz7D4bi SJbmiabqyrbuC8fyTNf2jef6zvf+DwwKh8Si8YhMKpfMpvMJjUqn1Kr1is1qt9yu9wsOi8fksvls KwAAOw== --=_af15c636883a006c0b7ee53cd2858fd0--

--=_ff31ca09d337529e7bcb747f1beb6874--

degremont avatar Jan 30 '24 10:01 degremont

Thanks for your immediate reply.

Here is the output:

$ clush -d -v -a date
DEBUG:root:clush: STARTING DEBUG
Adding nodes from option -a: 
Usage: clush [options] command

clush: error: No node to run on.

and

$ nodeset -f -a

Obviously that is not what I would expect :)

skwde avatar Jan 30 '24 11:01 skwde

Nodeset not returning what you expect points out that the nodegroup definition is incorrect.

Maybe you should have given nodeset -LL and cluster.yaml content for us to debug? :)

What is your 'all' definition ?

degremont avatar Jan 30 '24 11:01 degremont

Well I have no explicit all definition.

To my understanding all should be everyting in roles since I have default: roles in groups.conf.

Ok, here is a simplified example:

My cluster.yaml

roles:
    dev: '@dev:all'


dev:
    dev: 'dev01'

with

$ nodeset -LL
@dev 
@dev:dev dev01

skwde avatar Jan 30 '24 13:01 skwde

The special name for all feature is not 'all', but '*'

You should use:

dev: @.**:'

See https://clustershell.readthedocs.io/en/latest/config.html#groups-config

and https://clustershell.readthedocs.io/en/latest/tools/nodeset.html#node-groups

degremont avatar Jan 30 '24 13:01 degremont

I don't get your reference to dev: ***@***.***:*'

But it now works with the following. My cluster.yaml should actually look like

roles:
    dev: '@dev:*'


dev:
    dev: 'dev01'

Ok the reference was (at least for me) a bit hard to find. In https://clustershell.readthedocs.io/en/latest/tools/nodeset.html#working-with-range-sets it mentions that all is a external call. Then one actually should look at https://clustershell.readthedocs.io/en/latest/config.html#group-source-upcalls under External calls and see the description of all.

I think it would be good if a note is already under https://clustershell.readthedocs.io/en/latest/config.html#yaml-group-files.

skwde avatar Jan 30 '24 14:01 skwde