OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

Input Validation, Tensorboard Tweaks + Drag-n-drop support

Open O-J1 opened this issue 5 months ago • 11 comments

High level summary of what this PR does:

  1. Drag-n-drop support for all path_entry widgets
  2. Drag-n-drop support for the concepts tab. Wont duplicate
  3. Merges file_entry and dir_entry into a single function
  4. Adds more extensive input validation, including to path_entry and especially model output destination
  5. Adds debouncing to all fields derived from entry including time_entry which will partially prevent the division by 0 backup bug, combined with prior fix this will make it impossible
  6. Fixes the tensorboard workspace spam when typing a new one and always on
  7. Streamlines Tensorboard management down to a single input and makes it work with cloud in Always On.

Initial checklist

  • [x] If Always-on is True, shouldnt be terminated in training nor after
  • [x] filter new tensorboard warnings that we have no control over and google has no current plans to solve: https://github.com/tensorflow/tensorboard/issues/7003
  • [x] Debounce the workspace dir field, so we dont get a new folder created for every character we type (#1028)
  • [x] dnd files into a file_entry results in file path
  • [x] dnd files or folders into dir_entry results in a directory path
  • [x] parent folders are validated to exist
  • [x] fix merge conflict with master
  • [x] Check if Always-On tensorboard doesnt break cloud training
  • [x] Check if Off and Train Only mode don't break cloud training nor start Tensorboard locally during training
  • [x] Cloud training doesnt cause invalid path warnings or such

Validation specific checklist

  • [x] User enters without extenstion, gets auto-added
  • [x] User sets a prefix, is used for autocomplete
  • [x] User can choose to use friendlywords instead of datetime
  • [x] Tensorboard will use model output name.
  • [x] Overwrite prevention works and is toggeable

Tensorboard has now changed from enable + toggle to just a dropdown

image

Drag and drop demo

https://github.com/user-attachments/assets/366fbb75-fd1e-49a1-95b7-e62ef655ab4a

O-J1 avatar Jul 04 '25 17:07 O-J1

From UX POV, AO should be default imho if we're moving it to dropdown (which I appreciate the reasoning for), but viewing it from a layman's POV, most users won't understand these options and AO provides the best user experience.

My worry about current state (train only being default) is that users trying to get tensorboard running once they've made a cup of tea and watched TV while the model trains will come back, not be able to access TB, and then end up disabling it via the dropdown.

Other than that, good changes @O-J1, thanks

BitcrushedHeart avatar Jul 04 '25 18:07 BitcrushedHeart

note that there is another tensorboard option on the cloud tab, "Tensorboard TCP tunnel". These changes might break that option, and/or that option might fit into any new dropdown with all the options

dxqb avatar Jul 11 '25 10:07 dxqb

always on pretty much breaks that option, but as long as it was just an option that you can choose it was probably ok. "always on" being the new default is a problem though

dxqb avatar Jul 11 '25 10:07 dxqb

pre-commit.ci autofix

O-J1 avatar Sep 02 '25 14:09 O-J1

Fixed merge conflicts. Please test and review @dxqb when you have time.

O-J1 avatar Oct 04 '25 07:10 O-J1

Hypopo tested with remote and stated it worked fully as he expected

O-J1 avatar Oct 18 '25 05:10 O-J1

Note on my latest test from yesterday. I noticed a UI Bug where I could not change the model output destination.

hyppyhyppo avatar Oct 20 '25 06:10 hyppyhyppo

Note on my latest test from yesterday. I noticed a UI Bug where I could not change the model output destination.

Being vaugue doesnt help, can you describe the exact steps, because I can do it fine

Requirements:

  • parent directory must exist (existing behaviour, we just now validate it)
  • If there is an extension provided, it must match whats set in the dropdown (new)
  • On windows must have no invalid characters (existing behaviour, we just now validate it)

If you stop typing for more than 1.5 seconds validation assumes you are finished typing

O-J1 avatar Oct 20 '25 06:10 O-J1

Tried again, that happend with slow typing (so the 1.5 s limit). Yesterday I just wanted to change the lora name and I couldn't but can't reproduce it today.

hyppyhyppo avatar Oct 20 '25 07:10 hyppyhyppo

I can consider adding something to increase the limit to 2 seconds but if you are going that long between keypresses, it points to one not having decided name yet. Unfortunately validation is not omniscient, if I made it only apply when you click off, or hit enter, then it won’t trigger for some people.

O-J1 avatar Oct 20 '25 07:10 O-J1

We ran a poll these were the final results, which notably do not include the 7 new users Im aware of that encountered the "wheres my safetensor file" issues.

image

Given these results and accounting for new users, the solution will likely be:

  • Clean up general model tab see #1022 so that we can place new toggles
  • Introduce a two new toggles "Show validation tooltips" and "Autocorrect File/Dir inputs"
  • Use logic where unless ending with a slash, counts as filename and can best match poll expectations

I will hopefully have an example commit ready soon TM

O-J1 avatar Oct 23 '25 05:10 O-J1

  • Do you want to split off the tensorboard parts into a separate PR? This seems easy to do: it's mostly separate files and some changes to UI code but not a lot. This could be a useful merge soon.

No we cannot, we discussed this already

O-J1 avatar Nov 20 '25 16:11 O-J1

starting the UI on current main branch takes about 10 seconds on my linux machine, until I can change tabs. With this PR, it takes about 25 seconds. This is significant by itself, but I remember on my old windows notebook it already took half a minute to be ready before - does it now take minutes?

With this PR prior to the now committed performance changes it took 10.3~ seconds on my windows machine. It now takes 8.5 seconds, so a 20% speedup. There might still be some more juice to squeze, unsure.

O-J1 avatar Nov 20 '25 18:11 O-J1