dataset-webui
dataset-webui copied to clipboard
Webui for editing/managing LoRA datasets
LoRA Dataset Webui
This project aims to help with the creation and management of LoRa training datasets. Scroll down to the bottom of the page for a feature overview.
This is still in beta - please report any bugs you find Pull requests are welcome. Currently everything is just cobbled together
Roadmap:
- Refractor tag management
- Add single-image tag overrides [mostly done]
- Add superresolution/upscaling for small images
Known issues:
- No files/folders are ever deleted, leading to clutter/orphaned images
- Deleting an enite folder can break the step
- When an image is open and you save the dataset, it will throw an error and half of the files are left where they are.
Getting started
(optional) create a venv first:
python -m venv venv
venv\Scripts\activate
install the requirements:
pip install -r requirements.txt
start either by running start.bat or manually using:
python webserver.py
(see python webserver.py --help for launch arguments)
Access the webui on the following URL: http://127.0.0.1:8080/
If tagging/cropping/etc is too slow, try run pip install onnxruntime-gpu, but keep in mind that this will use some of your VRAM.
download-dependencies.py
Running this script is recommended to get all features of the webui.
using start.bat already downloads all dependencies by default
It will gives you the option to download the following files:
danbooru-tags.jsonandgelbooru-tags.jsonfrom github gist or catbox.moe.- You also have the option to scrape the tags from the site directly.
cropper.jsandcropper.cssfrom Cloudflare/cdnjs.
Updating
Clear your browser cache between updates. It tends to leave the old scripts/css loaded
Folder structure
The folders created are meant to be used as follows:
0 - raw- raw images from the internet / screenshots1 - cropped- cropped images (1:1 aspect ratio)2 - sorted- images grouped by quality / topic / etc3 - tagged-.txtor.jsonfiles containing autotagger output4 - fixed- pruned tags in.txtformat.5 - out- scaled down images and pruned tags - point your training script heredatasets- all your datasets are saved here
Features:
Some of these images/videos might be outdated. There's also in UI tooltips. If something breaks just open an issue here on Github.
Dataset manager
- Save / load datasets you're working on
- Avoid having to change training folder, just point your training script at the
5 - outfolder and load the right dataset - Write notes for yourself

Cropping
- Crop images in your browser
- Edit already cropped images
- Duplicate image - crop two separate parts
- Quickly set the cropped area, copy it from the previous image
- Keyboard shortcuts
https://user-images.githubusercontent.com/125218114/228365872-ec57af74-5fb1-43e8-ab0c-ebb2feb3fd00.mp4
Auto cropping
- This is a good "baseline" to work from
- You can go back and quickly fix what it misses
- It can detect multiple subjects.
https://github.com/city96/dataset-webui/assets/125218114/a9e4eb9c-6fd3-406e-aee2-619a251f82f0
Sorting
- add categories
- quickly sort multiple images, captcha style
- hit detection can be janky
https://user-images.githubusercontent.com/125218114/228366245-de9b590a-0489-422b-8689-e0a262e69561.mp4
Auto sorting
- Don't want to do it manually? Set the tags and sort automatically.
https://user-images.githubusercontent.com/125218114/228366461-3da9085a-6ec7-40f6-b746-773b168fe546.mp4
Tagging
CPU-only autotagger
- A bit slow but does not take any vram, doesn't influence training.
- The output isn't realtime, I think it can do about 1 image/sec on my 11gen i5
https://user-images.githubusercontent.com/125218114/228366774-5609c1b5-28b6-4274-89b0-8296144b7f2a.mp4
Tag pruning
- Prune useless tags from the autotagger
- Normalize tags
- Quickly blackist/whitelist tags
- Replace tags on all images
- Edit rules, test the effects

Output
- Scale images to required training resolution
