KFR Roadmap
KFR 7 is scheduled for release this fall. It will be one of our most significant updates, introducing new features, broader platform support, and substantial performance improvements.
Below is an overview of the planned features across upcoming releases:
New SIMD-optimized functions 🚧
Expanded set of low-level primitives for bit manipulation, 3D graphics, and advanced math operations, all optimized with SIMD instructions to maximize throughput on modern CPUs.
New audio I/O implementation ✅
Support for reading/writing WAV, AIFF, Apple CAF, FLAC, Raw, and reading MP3 formats. Includes highly optimized conversions to specific formats.
DFT refactoring
A complete rework of the Discrete Fourier Transform implementation for greater flexibility, modularity, and raw performance, enabling more efficient integration in signal processing workflows.
C++20 integration ✅
Adoption of modern C++20 features to simplify expression handling, function overloading, and template metaprogramming, allowing the codebase to focus on clarity, maintainability, and cutting-edge optimizations.
Elliptic filters ✅
Introduction of elliptic IIR filters, expanding the library’s DSP toolbox with more flexible filter design options, enabling sharper roll-off and higher selectivity in practical applications.
Zero-Phase IIR Filter ✅
Implementation of zero-phase filtering using forward-backward IIR filtering techniques, allowing users to apply IIR filters without phase distortion.
Fat binary support for macOS ✅
Support for universal binaries on macOS, enabling seamless execution on both Intel and Apple Silicon (M-series) architectures without the need for separate builds.
Better GCC and MSVC support 🚧
While Clang remains ahead in vector optimization, KFR will provide improved support for GCC and MSVC, helping these compilers generate more efficient SIMD instructions and narrowing the performance gap.
Half precision support on ARMv8.2
Native float16 support in vec<> types on ARMv8.2+ CPUs, allowing higher throughput and reduced memory bandwidth in machine learning and DSP workloads.
Better documentation 🚧
A redesigned documentation system with:
- Clearer examples and use cases
- API reference improvements
- Guides for performance tuning and integration This will make onboarding smoother and expert usage more efficient.
New architecture: RISC-V ✅
Initial support for RISC-V vector extensions, enabling KFR to run efficiently on the rapidly growing RISC-V ecosystem and broadening its role in next-generation embedded and HPC systems.
Real-world examples
A collection of ready-to-use, production-grade examples demonstrating KFR in domains such as:
- Audio signal processing
- Spectral analysis
- Embedded DSP applications
- Machine learning preprocessing These examples aim to reduce integration time and showcase practical best practices.
for the GCC support, if that can be useful I had seen an issue about it in GCC for them to support the clang vector intrinsincs, here is what their maintainers had to say about it: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88602
KFR supports vector intrinsics for both GCC and Clang since 9 September: Commit: https://github.com/kfrlib/kfr/commit/aff2d560d5d246ce3d375cfc98ca1374a7329694
But my benchmarks show worse performance with extended intrinsics enabled in GCC, so this is enabled only for Clang by default. KFR has two backends: the one previously used by Clang and the one used by GCC and MSVC. The former delegates vector shuffles to the compiler, so a good vector optimizer is required for this to work. The latter backend translates every vector shuffle pattern to a specific SSE* or AVX* instruction, performing all optimizations on the KFR side. This is a huge amount of work but lets GCC and MSVC achieve performance not much worse than Clang in many scenarios. Letting GCC perform these optimizations by calling __builtin_shufflevector, as KFR does for Clang, exposes the difference between the excellent Clang optimizer and the average one in GCC.
I saw the issue you linked. The code Andrew Pinski provided didn’t work for me with recent GCC. I spent some time fighting the compiler, and the final definition now looks like this (KFR 7.0.0):
template <typename TT, size_t NN>
struct simd_
{ // using = didn't work, only typedef
typedef unwrap_bit<TT> __attribute__((vector_size(sizeof(TT) * next_poweroftwo(NN)))) type;
};
template <typename TT, size_t NN>
using simd = typename simd_<TT, NN>::type; // typedef here didn't work for some reason