stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Webui's usability enhancement MVP1
Describe what this pull request is trying to achieve.
Improving usability by changing the front-end UI layout:
- For easier readability and filtering of specific prompts, put the text prompt and negative prompt in two columns.
- To make it clearer for users who do not know much about deep learning concepts, change the name of the “extra network” to “model and network library” and move it to the bottom of the top row.
- Rearrange other UI elements in the top row to make them easier for users to locate.
Additional notes and description of your changes
Mostly frontend-related changes in this PR. Restyled and renamed the extra-network button which might be worth mentioning.
Environment this was tested in
List the environment you have developed/tested this on. As per the contributing page, changes should be able to work on Windows out of the box.
- OS: Windows 10, Windows 11
- Browser: Chrome, Edge
- Graphics card: NVIDIA GTX 1080 8GB, NVIDIA RTX 3080 12GB
Screenshots or videos of your changes
Before
After
The code is not fully aligned with the design mockup because this is an MVP. If necessary, we will fine-tune the CSS styles.
We know this is a small PR and it doesn't look very fancy. Our priority is focusing on the user flow, not aesthetically changing something that is unnecessary. This is just a small portion of potentially a whole user-centered redesign. We want to start with this first.
UX research involved: https://github.com/one111eric/stable-diffusion-webui-miao/wiki/UX-Research
The next phase (wireframe):
Postscript Hello AUTOMATIC1111,
Thank you for your amazing work in translating Stable Diffusion into a web-UI and making it accessible to end-users outside of the deep learning field. I really love the concept of your webui.
As a UX designer, I observed that the current design of the webui is feature-oriented and can be difficult for users to navigate. To address this issue and improve usability, I have begun working on enhancing the webui using a user-centered approach while keeping costs to a minimum.
All the research and design assets so far can be found on our wiki page: https://github.com/one111eric/stable-diffusion-webui-miao/wiki.
I collaborated with my friend Miao. We would appreciate your feedback on our work and whether you think it’s worth pursuing.
This PR is in MVP1.2. You can find all materials and detailed explanations including research about this PR here: https://github.com/one111eric/stable-diffusion-webui-miao/wiki/Phase-1
I understand that you may prefer to receive themes rather than a pull request for your branch. However, in order to significantly improve usability, many aspects of the page need to be revised and restructured. This may affect the original structure and cannot be achieved through a theme alone. That’s why we’re reaching out to you for feedback on whether you would like us to proceed with this work.
Furthermore, my friend has limited time, so if you like my concept, I (we) can also work directly with you. In addition to UX design, I am also an all-around designer. I really enjoy working on the webui and am eager to contribute to this project. You can reach me at [email protected] or contact Miao at [email protected].
Thank a lot for your time,
I think most classes of users, excluding the most casual of them, would be unhappy if the main prompt inputs were reduced in size. After the max token-size was increased it's common for there to be prompts that are multiple lines long. I'd say it would make sense other than the generate button is plenty big in my experience. Even on platforms that are more cumbersome like those using vr controller input.
@zhouyi311
There's another PR targeting UX/UI design... https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/7519 Check it out.
Also, some of Auto's responses to the PR:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/7519#issuecomment-1435903148 https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/7519#issuecomment-1435921872
I think most classes of users, excluding the most casual of them, would be unhappy if the main prompt inputs were reduced in size. After the max token-size was increased it's common for there to be prompts that are multiple lines long. I'd say it would make sense other than the generate button is plenty big in my experience. Even on platforms that are more cumbersome like those using vr controller input.
@papuSpartan Thank you for your response and feedback. I appreciate your perspective as a representative of hardcore users, and your insights have been very enlightening and helpful.
I would like to provide further context regarding the decision to split the prompt textbox into two columns.
-
the textbox automatically expands to a large size when multiple lines of input are entered, the maximum capacity is about 3600. I have included a screenshot captured from a 2k monitor (with 160+ characters per line)
-
studies have shown that the optimal line length for readability is approximately 80 characters per line. https://baymard.com/blog/line-length-readability This is like a standard in graphic design and typography that has been established for many years, it is worth noting that even if we cannot track the exact line length of this prompt box (cuz its based on browser width), it will still be longer than 80 characters per line in normal cases after splitting. As a result, the readability will be significantly improved after splitting. Based on my research of the most common prompts used in civitai.com, this size is more than sufficient. Furthermore, for SD 2.1 models, negative prompts are critical and users will need to fine-tune them. The split boxes provide higher browsing speed for both negative and positive prompts and provide a significant improvement in browsing speed. (but yes, I really don't know if hardcore users read and fine-tune text prompts or simply copy-paste)
-
I conducted a simple user test and found that this approach performed better. However, I acknowledge that all of the users I tested were not "hardcore" users.
-
there are additional materials and preparations for this project which can be found on our internal wiki page: initial research https://github.com/one111eric/stable-diffusion-webui-miao/wiki/UX-Research and Phase 1 assets https://github.com/one111eric/stable-diffusion-webui-miao/wiki/Phase-1
Well, to be completely honest, I mean, it is always a complex discussion when down to the "target audience" perspective. Who is our primary target audience? Since the web-ui gained popularity, do we want to serve those "most casual users", which might be covering 90% of the current users and 99% of new users in the future, is a question.
I did my research by watching most of the YouTube and Reddit tutorials talking about "how to use SD webui". Watched their user flow and how they use webui, addressing user problems from the most frequently mentioned tasks and difficulties that users might struggle with. Sure, those tutorials mostly are all for "casual users". But I mean, who was born a pro? who wasn't a casual user in the first place?
I hate to say that but there is a survivorship bias on GitHub, which are full of hard-core users. I understand that there are tons of edge cases for hardcore users to consider. While it's true that there are edge cases in the day-to-day heavy usage case, there is no single optimized solution for everyone in UX design. When a solution addresses casual users' problems, it may not be ideal for hardcore users (This is also the reason Reddit provides three views). However, the balance often tilts towards ordinary users rather than power users for products that serve both.
Therefore, we have to figure out who is the primary user and what we want to prioritize. It's a question of what exactly we want to achieve: An iMovie that everyone can enjoy cutting their videos at home, or an AfterEffects that is only for pros' creating. This is an honest question and an open question, and will hugely affect the direction when people design stuff for it.
This is also the reason I try to contact the author for such a small PR. I want to get his opinion about if he thinks this should be the direction and is happy to move toward it.
@zhouyi311
There's another PR targeting UX/UI design... #7519 Check it out.
Also, some of Auto's responses to the PR:
Thanks, @Nacurutu that is been very helpful!
After reading the author's response, that matches my thoughts. I don’t want to make it look fancier without addressing real usability issues. Instead, I want to enhance usability with minor visual adjustments and rely on Gradio basic styles.
The author said, “a small PR first.” I think this PR is the way how he suggested starting with. And the goal is to get his initial feedback.
What I did so far is mostly research.
- studies have shown that the optimal line length for readability is approximately 80 characters per line. https://baymard.com/blog/line-length-readability This is like a standard in graphic design and typography that has been established for many years, it is worth noting that even if we cannot track the exact line length of this prompt box (cuz its based on browser width), it will still be longer than 80 characters per line in normal cases after splitting. As a result, the readability will be significantly improved after splitting. Based on my research of the most common prompts used in civitai.com, this size is more than sufficient. Furthermore, for SD 2.1 models, negative prompts are critical and users will need to fine-tune them. The split boxes provide higher browsing speed for both negative and positive prompts and provide a significant improvement in browsing speed. (but yes, I really don't know if hardcore users read and fine-tune text prompts or simply copy-paste)
With the added context of the input boxes being able to expand I'm mostly on board. Though, I still think the generate button would be too large (at least on desktop). Maybe it could have some sort of css expansion effect to make it more noticeable when it's hovered over to make it pop? (no clue)
I hate to say that but there is a survivorship bias on GitHub, which are full of hard-core users. I understand that there are tons of edge cases for hardcore users to consider. While it's true that there are edge cases in the day-to-day heavy usage case, there is no single optimized solution for everyone in UX design. When a solution addresses casual users' problems, it may not be ideal for hardcore users (This is also the reason Reddit provides three views). However, the balance often tilts towards ordinary users rather than power users for products that serve both.
Regardless of what the change is, if its to the front-end, there will probably always be some group of users that are (at least temporarily) dissatisfied.
I think it's accepted by most in the sd community that this is the most popular webui. That, I think, has a lot to do with the fact that (generally) the sdwui project keeps up with the incredible pace of stable-diffusion related advancements. While I do think there is still plenty of room for general improvements to be made to the UI overall, I don't think the current UI should be heavily altered because it will likely be changed again anyway due to upcoming work from the sd community and ai fields. So, I do think that sdwui should be at least somewhat more tailored towards less casual users. The UI needs to provide the extensive tools that more advanced users, but also professionals need. I think this viral production by the Corridor Digital team utilizing the current UI is testament to that. However, I think it might be possible to have our cake and eat it too if we did something similar to what motherboard manufacturers do with BIOS interfaces. In this arrangement they have two parallel UI's: the main "advanced" UI (what we have currently), and the "EZ" UI.
credit: msi.com
Some random thoughts in random order:
-
Automatic1111 seems like a pro user tool, people want something simpler use other UI's or paid online versions. I don't think casual users are looking for stable diffusion specifically, that's a technical distinction.
-
People will complain no matter what you change or improve.
-
Long prompts are unreadable no matter what you do. Code has solved this with syntax highlighting, maybe this is worth exploring.
- autocompletion for embeddings and lora models
- hovering over tokens show affected areas in the output image (the tech is out there in form of extensions, not sure how fast it is though)
- proportionally put visual emphasis on weighed prompts
- encourage use of (foo:1.5) instead of (((foo))) for readability somehow
- a place that explains how you can use the various prompt syntax
-
CFG Scale affect prompts, so it should be grouped together with prompts I think (?)
-
Batch count and Batch Size can be 1 slider and a checkbox if you want to use generate multiple images in parallel or after eachother. (I think?)
-
Maybe batch size belongs in the output area
-
Width and height could be moved to the output area showing a preview
-
The collapsible cards extensions can add to txt2img could have an enable button, or simply a way to add or remove them from the ui.
- hires fix could act like a collapsible card that you enable or disable
- same goes with tiling
- restore faces too, but we could also add options to control weight and select model
hi, @zhouyi311
As I mentioned before, there is already a PR targeting a UX/UI change with good results, and also at an advanced stage. if that PR gets merged in the future, your dedication to this new PR will be for no reason.
My recommendation, talk to the owner of that PR and see if you can work together.
It has a lot of new functions, improvements, and bug fixes people were asking for. and the webUI is going to have just one UI, not a lot of options.
To save time and effort, again, my recommendation, talk to him and work together.
@Nacurutu Thanks for your advice. I will try to contact and talk to him. That being said, still, I think it’s important to get in touch with author AUTO1111, UX isn’t just about the appearance and providing features, there are many fundamental aspect need to be figured out before starting to make changes to code, to be honest. So what I did was mainly research, and this PR is more of just a quick start.
UX improvements are quite tricky, sometimes a small improvement involves a fundamental structural change in code. Therefore a PM usually is coordinating those stuff in real work. But I don’t think that is possible for GitHub projects. So I think communicating with the author is important before making big moves. This is also why I made this small PR, which is the way to try to contact AUTO1111.
CFG Scale affect prompts, so it should be grouped together with prompts I think (?)
@CapsAdmin This one is a bit tricky because the original author basically divided the main UI to two sections, a TopRow (prompts and a few other buttons) and a "main row" (extra network, the bars and output). Moving things from one to another would need a bit more code change than I'd like to do for now. But if the author thinks it's okay, I believe it's totally doable.
Visual is certainly an important aspect of ux, but its just one piece of the puzzle. There are many other factors that play a significant role in creating a positive ux. For example, usability refers to how easy it is for users to accomplish their goals using the product. This can be achieved through intuitive interfaces, clear navigations etc. They work together to ensure that the product not only looks good but also meets the needs and expectations of its users.
It’s also important to note that this is an MVP version, which means that it’s an early version designed to test and validate the solution with minimal code changes. As such, the focus at this stage is on addressing (some of) the core user problem rather than perfecting the visual design. While the MVP may not look as polished as a final version, it serves a crucial purpose in allowing gather feedback and iterate on the design before investing more time and resources into development. Evaluating the MVP on its visual appeal would miss the point of the ux improvement process and overlook the valuable insights that can be gained from testing and refining the product at this early stage. In addition, evaluating ux based on an image can be misleading, as it doesn’t provide the full picture of how users interact with the product. That’s why conducting usability tests is important.
While visual appeal can enhance the overall experience, it’s important to prioritize and address these other aspects of ux to create a truly successful product.
From: @.> Sent: Saturday, March 11, 2023 2:04 PM To: @.> Cc: Yi @.>; @.> Subject: Re: [AUTOMATIC1111/stable-diffusion-webui] Webui's usability enhancement MVP1 (PR #8458)
Looks way better in the Before picture. In general this PR just doesn't make sense. The other UX pr is miles and miles better in every possible way. Sorry if I'm too direct.
— Reply to this email directly, view it on GitHubhttps://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/8458#issuecomment-1465033739, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC2YJZNH6BDX5PTN4GD64XDW3TZFNANCNFSM6AAAAAAVVJVSW4. You are receiving this because you were mentioned.Message ID: @.***>
i would say it is not an improvement but rather safari hunting........... especially regarding the layout pos/neg side by side(which is not even using enough of the horizontal space.....), and ALSO leaving the not so useful style box so wide and empty........ever thought about how they should look in real life? rather than just moving boxes around....