stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

update ESRGAN architecture and model to support all ESRGAN models

Open victorca25 opened this issue 3 years ago • 9 comments

@AUTOMATIC1111 @d8ahazard This is the PR for https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/1805, which updates the ESRGAN architecture and model to support all ESRGAN models in the database (https://upscale.wiki/wiki/Model_Database), the models from the original repo, as well as BSRGAN and real-ESRGAN models.

Updating from the comment in the conversation:

  • The 1x models work, but currently the app skips running the models if scale = 1
  • There are models with scales larger than 4 (8x and there are even 16x not in the DB), but the app has a max of scale 4
  • The BSRGAN model and arch files are not needed and I'm removing them in the PR. Just need to load the models as regular ESRGAN models.
  • The real-ESRGAN models now work with the same architecture, but a bit of a dependency exists currently in the application with the models that can be downloaded automatically with the get_realesrgan_models() function: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/77a719648db515f10136e8b8483d5b16bda2eaeb/modules/realesrgan_model.py#L82, so I'm not removing the real-ESRGAN bits, but they can be loaded as regular ESRGAN models now. For this it would be possible to consider a separate list of suggested models to use that can be sent to the "model downloader" function (maybe one that does not depend on basicsr) to fetch them automatically. This list could include not only these real-ESRGAN models, but maybe also BSRGAN and any other, that way the entire real-ESRGAN file and import can be removed.
  • The parameter inference function (infer_params()) has the limitation that it depends on the models being translated beforehand to infer their parameters, which is why for two particular cases of real-ESRGAN (realesr-animevideov3.pth and RealESRGAN_x4plus_anime_6B.pth) there is one parameter for each that is set according to the filename. The others are automatic.

The first 2 points are not a major issue, but will need some thinking about how to go about it, also because the same slider is user for every upscaler option.

I tested with a few models from the database, the real-ESRGAN models, BSRGAN and some others and they all worked fine, let me know if you find anything that doesn't work.

victorca25 avatar Oct 09 '22 11:10 victorca25

i renamed the file, please pull and merge so that the changed lines can be properly seen in github ui

AUTOMATIC1111 avatar Oct 09 '22 12:10 AUTOMATIC1111

Conflict resolved, the new code is based on the code in my repo: https://github.com/victorca25/iNNfer .

victorca25 avatar Oct 09 '22 12:10 victorca25

for the list of models, there is an option for Real-ESRGAN for which models to show in UI. You can probably do the same thing here.

AUTOMATIC1111 avatar Oct 09 '22 13:10 AUTOMATIC1111

It can be done, but honestly I don't know how to work with the UI code. As of right now, any of the above mentioned models can be dropped in the ESRGAN models directory and they will work, but I'd rather let the UI part be handled by someone that understands it better and then the unneeded real-ESRGAN files can be removed.

victorca25 avatar Oct 09 '22 13:10 victorca25

as there are a lot of changes I'd like to see few messages from users who ran this and it worked well for them before merging.

AUTOMATIC1111 avatar Oct 09 '22 15:10 AUTOMATIC1111

@d8ahazard can you try out the code to validate it so it can be merged? For reference, most of the ESRGAN models in the database were trained with my code or a derivative of it, so it would be surprising if any of them don't work.

victorca25 avatar Oct 11 '22 07:10 victorca25

@d8ahazard can you try out the code to validate it so it can be merged? For reference, most of the ESRGAN models in the database were trained with my code or a derivative of it, so it would be surprising if any of them don't work.

Yes, I'll give it a go today and validate. Also - pretty cool that you authored that paper. My hat goes off to you. I can program allright, but the actual model training stuff is still something I'm trying to wrap my brain around. ;)

d8ahazard avatar Oct 11 '22 14:10 d8ahazard

I didn't author the paper, but I did write a repo that made it easier for everyone to train the models. You could say it was user-focused :). Thanks!

victorca25 avatar Oct 11 '22 16:10 victorca25

Hello @d8ahazard ! Have you been able to do some tests? I've done more of tests with different models (original ESRGAN, real-ESRGAN, BSRGAN and a few different ones from the DB at different scales) and they all worked fine, but I don't know if there are any corner cases remaining to be tested before merging. Also don't know who else would be available to test.

My code has been previously tested in Windows, Linux and Mac (intel and arm) by many users, both for training and inference of the models, in theory there's nothing that should introduce any issue in different OSs/chip architectures.

I'd like to try working with the UI to see if I can centralize the model downloader to remove the redundant real-esrgan files after the PR is merged, for further simplification of the code base.

victorca25 avatar Oct 19 '22 15:10 victorca25

@AUTOMATIC1111 since nobody else has provided feedback and it probably won't be tested until merged to master, below I'm adding screenshots of the tests with multiple different models with different scales (1x, 2x, 4x, 8x), architectures (ESRGAN, ESRGAN-plus, Real-ESRGAN, BSRGAN, SRVGG) and configurations (6 blocks, 22 blocks).

These are the models I have in the GUI: image

And these are the models in my directories: image image

I'm adding the model's name, the scale of the model and the architecture. I added a "2" at the end of the BSRGAN and RealESRGAN_x4plus.pth models, so they are not confused by the ones in the interface.

ESRGAN models from the model db:

  • sudo_RealESRGAN2x_3.332.758_G.pth: 2x, ESRGAN (6B) image
  • 4x-AnimeSharp-lite.pth: 4x, ESRGAN-lite image
  • 8x_glasshopper_ArzenalV1.1_175000.pth: 8x, ESRGAN image
  • realesrgan-x4minus.pth: 4x, ESRGAN-plus image
  • 1x_DitherDeleterV3-Smooth-[32]_115000_G.pth: 1x, ESRGAN image
  • 8x_HugePeeps_v1.pth: 8x, ESRGAN image
  • reboutcx.pth: 4x, ESRGAN image

BSRGAN: original model

  • BSRGAN2.pth: 4x, BSRGAN image

Real-ESRGAN

  • realesr-general-x4v3.pth: 4x, SRVGG image
  • RealESRGAN_x2plus.pth: 2x, real-ESRGAN image
  • RealESRGAN_x4plus_anime_6B.pth: 4x, real-ESRGAN (6B) image
  • RealESRGAN_x4plus2.pth: 4x, real-ESRGAN image

This is the image used for testing, in case tests have to be replicated: image

The results from real-ESRGAN and BSRGAN were compared and the generated images are identical to the ones generated by the existing code.

As mentioned in the previous comment, the code has been tested by users of my repos in Windows, Linux and Mac (intel and arm) by many users.

Let me know if anything else is required so the code can be merged.

victorca25 avatar Oct 23 '22 10:10 victorca25