Magpie
Magpie copied to clipboard
[Feature Request] ONNX support
In compact structure (model size 256k~4m) that would be a runtime effect base on DirectMl
Am I so greedy?😂
REAL-ESRGAN is to large, It's too difficult to run it in real time on current computers.
REAL-ESRGAN is to large, It's too difficult to run it in real time on current computers.
The main bottleneck is memory size. 2k game + 2x enlarge cost about 16g memory. The speed can be real time in 3060 (512k model) only if the memory is unlimited. 😂
Imo that could work in igpu, though 780m still not good enough, maybe Qualcomm elite x, another story...
Some models can indeed be inferenced in real time, such as mpv-upscale-2x_animejanai. I plan to add support for ONNX in the future, but there is still a lot of uncertainty.
The SuperUltraCompact model isn't much larger than Anime4k UL model (around 2x, I guess), so it's kinda possible to be ported to HLSL format.
While porting to HLSL does indeed offer higher efficiency, the cost is also substantial unless there's an automated approach. I'm inclined to adopt ONNX Runtime, enabling us to seamlessly integrate any ONNX model with ease.
i personal think this is a great idea as animejanai does offer much better grafic some times. I would personal donate 20 US if this happen. Magie is getting better everyday. Love this thing so much.
While porting to HLSL does indeed offer higher efficiency, the cost is also substantial unless there's an automated approach. I'm inclined to adopt ONNX Runtime, enabling us to seamlessly integrate any ONNX model with ease.
I ported Animejanai V3 SuperUltraCompact and 2x-DigitalFlim to Magpie's effect if anyone want to try. https://gist.github.com/kato-megumi/d10c12463b97184c559734f2cba553be
Gist
magpie effect. GitHub Gist: instantly share code, notes, and snippets.
Great job! It appears that Animejanai is well-suited for scenes from old anime, as it doesn’t produce sharp lines like Anime4K does. However, a significant issue is that it sacrifices many details. DigitalFlim is sharper than Animejanai, it also suffers from severe detail loss. In terms of performance, they are roughly 20-25 times slower than Lanczos.
nothing happened after I put both files in effects folder (even rebooted the system)
For experiment I also put the fakehdr.hlsl and it works...
Don't know if I made any mistakes (version 10.05)
You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525
GitHub
An all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525
Thank u for your great work and help! Anyway I still don't know how to download the build from GitHub action, so let me keep that surprise till the next upcoming release.😁
However, a significant issue is that it sacrifices many details.
For that I think it's the common problem in ESR model, base on the structure (even large model can't keep many detail) and training datasets (animations?)
GitHub
An all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
Download from here: https://github.com/Blinue/Magpie/actions/runs/7911000525/artifacts/1246839355
Download from here: https://github.com/Blinue/Magpie/actions/runs/7911000525/artifacts/1246839355
Thank u. After sigin again I can download it. It's wired that kind of page from action Need to be sign in (otherwise show 404) even I already signed in iOS client...
You have to use newer version. https://github.com/Blinue/Magpie/actions/runs/7911000525
GitHub**refactor: XamlWindow 禁止子类直接访问成员 · Blinue/Magpie@e3dc41b**An all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
Can you port the SD model of animejanai, which is more aggressive in its detail reconstruction? an UC model for those of us with more computing power would also be great.
GitHub
An all-purpose window upscaler for Windows 10/11. Contribute to Blinue/Magpie development by creating an account on GitHub.
Can you port the SD model of animejanai
@spiwar Do you have link for it? Didn't find it on their github
For detail restore...2x-Futsuu-Anime, but its 4M... i think its a game for 4090
animejanai.zip Here is animejanai's Compact and UltraCompact for anyone with enough power. UltraCompact run like 3fps for 720p on my machine. Havent test Compact yet.
animejanai.zip Here is animejanai's Compact and UltraCompact for anyone with enough power. UltraCompact run like 3fps for 720p on my machine. Havent test Compact yet.
Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv. Can you port the v3 sharp model? They are in the animejanai discord beta releases.
Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv
Perhaps it's a limitation of magpie/hlsl. I'm hopeful that integrating ONNX will enhance its performance. What GPU are you using?
Can you port the v3 sharp model?
Ok. https://gist.github.com/kato-megumi/d10c12463b97184c559734f2cba553be#file-animejanai_sharp_suc-hlsl
Gist
magpie effect. GitHub Gist: instantly share code, notes, and snippets.
Can you port the SD model of animejanai
@spiwar Do you have link for it? Didn't find it on their github
You can find it in the full 1.1gb release, but i've included it here for convenience. 2x_AnimeJaNai_SD_V1beta34_Compact.zip
RTX 3080ti, upscale from 1080p source to 4k C model runs at seconds per frame UC model runs at 2-3 fps SUC model runs at ~40fps
If we can optimize this to run at decent speeds then it would be very nice, UC and C model looks quite natural with no oversharpening.
The performance optimization space is very limited, because the bottleneck is in floating-point operations.
@kato-megumi I found that 16-bit floating-point numbers (min16float) are more efficient, with about a 10% performance improvement on my side. But this is still not enough to make UC usable. Further performance improvement can only be achieved by using platform-specific APIs, such as TensorRT.
find data to enhance SUC model might be the better way forward... Comparing with tensor rt, directml is a universal solution, imo... (But obviously cannot gain from nv hardware acceleration) Or PyTorch compile with 8bit?
@Blinue Sorry, can you elaborate. I though using FORMAT R16G16B16A16_FLOAT
already mean 16-bit floating-point number?
I though using FORMAT R16G16B16A16_FLOAT already mean 16-bit floating-point number?
In hlsl, float is 32-bit, R16G16B16A16_FLOAT texture stores half-precision floating-point data, but it is converted to float when sampled. You have to explicitly cast to min16float to perform half-precision operations. See https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/using-hlsl-minimum-precision
Starting with Windows 8, graphics drivers can implement minimum precision HLSL scalar data types by using any precision greater than or equal to their specified bit precision.
Comparing with tensor rt, directml is a universal solution, imo... (But obviously cannot gain from nv hardware acceleration)
One advantage of ONNX Runtime is that it supports multiple backends, including DML and TensorRT. TensorRT is generally the fastest backend, it should be the preferred choice if available.
Same issue 3fps trying to run ultracompact, even though its fine when I use it in mpv
Perhaps it's a limitation of magpie/hlsl. I'm hopeful that integrating ONNX will enhance its performance. What GPU are you using?
tested on 3080 and a 4090, anything more than SUC is not useable for now with hlsl. We will definitely need onnx support so we can run with TensorRT.
Maybe if ↓ got implement UC will be usable? https://github.com/Blinue/Magpie/discussions/610
GitHub
When playing a visual novel, a significant portion of the screen remains static most of the time. Applying heavy effects to the entire screen feels inefficient and wasteful. Is it possible to apply...
Maybe if ↓ got implement UC will be usable? https://github.com/Blinue/Magpie/discussions/610
bloc97 makes a good point, I think #610 is hard to achieve. On one hand, especially for complex scaling algorithms like convolutional networks, it is difficult to determine what effect a pixel change has on the output. On the other hand, duplicate frame detection is already implemented, which can effectively reduce power consumption in many situations. Going further and only updating the changed areas is not very useful, because it is hard to do and only works for certain scenarios.
GitHub
When playing a visual novel, a significant portion of the screen remains static most of the time. Applying heavy effects to the entire screen feels inefficient and wasteful. Is it possible to apply...