video2x
video2x copied to clipboard
Add proper support for arm64
Is your feature request related to a problem? Please describe. video2x currently doesn't builds on arm64. See: https://github.com/user-attachments/files/19526880/2025-03-30_video2x-6.4.0_build.log
Describe the solution you'd like Considering that a lot of arm64 based devices don't have powerful GPUs, like for e.g. the Raspberry Pi with it VideoCore, Odroid and most smartphones with the Mali GPU, at first glances, it would appear unnecessary to support arm64, but some arm64 devices could use powerful GPUs. For instance, I own a SolidRun HoneyComb LX2 which has a 16 cores CPU, 64 Gb RAM as well as a 4 Gb Radeon RX 550 plugged in into the PCIe 8X port of that device. The Avantek Ampere eMag or Altra workstations also have PCIe that could support any Radeon GPU. I could also name the Nvidia Jetson and Tegra based devices which have built in powerful enough Nvidia based GPUs as well as ARM based Apple devices. arm64 also supports SIMD through the NEON instruction set. So, I'd like to ask you if it isn't too complicated and not too demanding, could you port video2x 6 to support ARM64? If required, I could also help for testing.
I think it's fair to at least build the CLI for arm64, provided cross-compilation isn't too complicated. I can try building an ARM package for you to try. What distro are you using?
It's a different story for Apple since it doesn't have native support for Vulkan + I know nothing about MoltenVK + I don't have an Apple Silicon device to test with. It's tracked separately in #1189
I can try building an ARM package for you to try. What distro are you using?
I'm using gentoo Linux, and so the portage package manager that already builds things from source, and a package for video2x already exists (from source): https://github.com/Tatsh/tatsh-overlay/tree/master/media-video/video2x Also see the build.log I posted in my first comment. It explains why the build failed.
provided cross-compilation isn't too complicated
I'd rather like it builds natively on my arm64 workstation, just like for it's dependencies that did built correctly (ncnn, ...).
I think it's fair to at least build the CLI for arm64,
The qt6 GUI shouldn't be hard to build either, and there's also a package for it in the same gentoo overlay https://github.com/Tatsh/tatsh-overlay/tree/master/media-video/video2x-qt6
Here's the issue I posted on the tatsh-overlay: https://github.com/Tatsh/tatsh-overlay/issues/450
And here's where build failure (the one in the build.log I posted) I'm getting is from: https://github.com/k4yt3x/video2x/blob/6bf0ee527d2c5b31e7e1f8dd1ca6383be1442f2a/src/encoder.cpp#L341
I'm not sure how to proceed with this yet. Multiversioning with AVX does help performance on x86 machines. I don't really want to remove them. At the same time, they have to be ignored somehow if you want to compile on ARM it seems. I don't want to add a preprocessor macro to each one of them either.
For now, I'll patch the code for removing that line: [[gnu::target_clones("arch=x86-64-v4", "arch=x86-64-v3", "default")]]
Anyway, shouldn't the compiler handle generating AVX code itself, without needing for extra in the "gnu::target_clones" flags in the code, if the right -mcpu, -march, -mtune are set?
As for avoiding using macros, I don't know what to tell you.
Anyway, shouldn't the compiler handle generating AVX code itself, without needing for extra in the "gnu::target_clones" flags in the code, if the right -mcpu, -march, -mtune are set?
Yes. The issue with that is if you run a binary compiled with -march=avx512f on a machine without AVX-512, it'll crash and give you an "illegal instruction" error. How multiversioning works is that it compiles the same function for multiple microarchitectures and selects the most suitable version during runtime. I.e., it'll use the AVX-512 version of the function if it's available, if not then AVX2, if not then SSE, etc. This allows one binary to be able to dynamically adapt to the platform it's running on and take the best advantage of the available instruction sets.
@k4yt3x I've just added aarch64 support for video2x in archlinuxcn, with gnu::target_clones lines temporarily removed: https://github.com/archlinuxcn/repo/blob/master/alarmcn/video2x/PKGBUILD#L35
Anyway, shouldn't the compiler handle generating AVX code itself, without needing for extra in the "gnu::target_clones" flags in the code, if the right -mcpu, -march, -mtune are set?
Yes. The issue with that is if you run a binary compiled with
-march=avx512fon a machine without AVX-512, it'll crash and give you an "illegal instruction" error. How multiversioning works is that it compiles the same function for multiple microarchitectures and selects the most suitable version during runtime. I.e., it'll use the AVX-512 version of the function if it's available, if not then AVX2, if not then SSE, etc. This allows one binary to be able to dynamically adapt to the platform it's running on and take the best advantage of the available instruction sets.
Maybe you could create one shared library (.dll .so) per revision of x86_64 and make the main executable choose the right one at runtime (or before its launch)?
@k4yt3x Actually could you add a macro #ifdef check for x86 / x86_64 and only apply [[gnu::target_clones("arch=x86-64-v4", "arch=x86-64-v3", "default")]] if the condition is met?
This would solve the issue for all architectures.
For RISCV, PPC64 and ARM64 I don't think there's a need for multiversioning.
Before anything: I found this project by sheer chance after reading about and into waifu2x and anime4k - I am not an expert in any of this stuff, but just really curious.
I never worked with the gnu::clone_targets directive, but considering it is effectively a form of overloading, I think a more elegant solution might be:
#ifndef V2X_NO_SPECIFIC_TARGET
# if defined(__x86_64__)
# define V2X_MULTIVERSION [[gnu::target_clones("arch=x86-64-v4", "arch=x86-64-v3", "default")]]
# elif defined(__aarch64__)
# define V2X_MULTIVERSION [[gnu::target_clones("arch=armv7", "arch=armv8", "armv9", "default")]]
# else
# error "Unsupported architecture (force with V2X_NO_SPECIFIC_TARGET)"
#endif
This would:
- Set a few "sane defaults" (although I was too lazy to pull up a list of ARM subarchs...)
- Allow temporarily setting a whole different string (
-DV2X_NO_SPECIFIC_TARGET -DV2X_MULTIVERSION='[[gnu::target_clones(...)]]') - Unify this in one central space
There's like four places where this is used anyway, so bundling it into a single macro should be fine
That said, I did not check for architecture-dependent code in any of those functions - let alone asm blocks.
As I tried to say on my last post, there's no need for multiversioning on ARM64 since the CPU itself only supports a single set of instructions, whether it be ARMv7, ARMv8, ARMv9. It can't be multiple ones since it's not like on x86_64 where x86_64 v1 to v4 are just the same architecture with just more features added by each new revisions.
To my understanding, ARMv8 is totally incompatible with ARMv9, some of their instructions might be the same, but a lot are differing, and I think especially for the SIMD instructions.
There's even slight differences in implementations of the same ARM version, for instances the Cortex A72 CPU might differ from other ARM CPUs of the same era like the RockChip RK3566 or other ones, and both differs from the official "armv8-a" implementation.
Here's my Gentoo portage common flags for the CPU of my SolidRun HoneyComb LX2 (that has a Cortex A72 CPU):
COMMON_FLAGS="-mcpu=cortex-a72+crc+crypto -mtune=cortex-a72 -O2 -pipe -ftree-vectorize -fomit-frame-pointer"
And the -mcpu=cortex-a72+crc+crypto -mtune=cortex-a72 -O2 flags are the only ones that should be used for that CPU.
@IngwiePhoenix If we have to add something like this for a bunch of functions it'll be pretty cluttered. I'm thinking if we should just remove the multiversioning and instead, make any of the pre-released binaries AVX2 (x86-64-v3)