sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

wandb token exposed to wandb users

Open bghira opened this issue 1 year ago • 13 comments

hello, there exists a parameter to pass a wandb token into the training utility. unfortunately, this shows on the "Overview" tab of the training session, as a part of the complete commandline that is used to execute the trainer.

This exposes users to potentially nefarious activity, or at the very least, a sense of unease and the possibility of snooping/leaking internal data.

bghira avatar Feb 16 '24 16:02 bghira

@bmaltais and @kohya-ss should probably do something about this :D does this project not care about security?

bghira avatar Mar 31 '24 12:03 bghira

This is not good. I guess the Wanda token need to be exempted from making it in the metadata. This is something @kohya-ss need to filter in the training scripts. There is not much I can do about the metadata…

bmaltais avatar Mar 31 '24 12:03 bmaltais

In my understanding, the metadata in the training model doesn't contain wandb token, but it would appear on the GUI screen. I'm not sure because I don't personally use either wandb or GUI.

kohya-ss avatar Mar 31 '24 13:03 kohya-ss

need to recommend users to go with wandb login command instead of a cmdline argument. just --report_to=wandb,tensorboard would work.

bghira avatar Mar 31 '24 13:03 bghira

That makes sense. I will update the doc in next release.

kohya-ss avatar Apr 03 '24 12:04 kohya-ss

fwiw in Diffusers, there is simply a check whether wandb is enabled and the commandline args contain the Huggingface Hub token, because any secrets on cmdline will be exposed via wandb.

but in this case, its the wandb token itself on the cmdline. i am not sure that a documentation hint is enough to stop a user from doing this. the fact that it's on the cmdline at all means it will be exposed 100% of the time this is used.

bghira avatar Apr 03 '24 15:04 bghira

You are correct. We must filter sensitive fields from command line args.

In my understanding, the repo doesn't expose either wandb token or command line args currently (the latter is discussed in #1231). If you find any code to do it, please let me know.

kohya-ss avatar Apr 04 '24 10:04 kohya-ss

it's a part of the wandb module. it sends ALL args, and there is no way to filter them:

image

bghira avatar Apr 04 '24 12:04 bghira

Thank you for clarification. That's not good. I think it is undesirable to disclose not only wandb token but other information as well. Is there any way to control this?

kohya-ss avatar Apr 04 '24 13:04 kohya-ss

none that i have found. i resorted in simpletuner to using json config files to hide cmdline args

bghira avatar Apr 04 '24 13:04 bghira

Thank you. That's nice. sd-scripts has .toml for configs, so I will show WARNING if the sensitive training args given in the command line.

kohya-ss avatar Apr 04 '24 13:04 kohya-ss

I have opened a PR for this #1240, If you have time to review it, I would greatly appreciate it.

kohya-ss avatar Apr 04 '24 23:04 kohya-ss

Would be great bowever to send parameters to Wanda. At the moment in kohya_ss, we don't see any training parameters in Wandb, only the project name (tracker name), and the run name set from kohya gui. Is it a problem to send all parameters through config?

rafstahelin avatar Apr 07 '24 12:04 rafstahelin