simde icon indicating copy to clipboard operation
simde copied to clipboard

docs: explain to users how to do function multiversioning / CPU dispatching with SIMDe

Open mr-c opened this issue 3 years ago • 3 comments

Including examples

See https://github.com/alexdobin/STAR/pull/1163#issuecomment-794130429

mr-c avatar Mar 09 '21 16:03 mr-c

It would be nice, but doing this portably is a lot of work.

The major problem your build system would need to rebuild a file repeatedly using different flags, once for each variant that you'd like to offer. There is no way to just repeatedly #include a file and keep everything in C; build system integration is required. That means teaching X different build systems about the right flags for Y different compilers for Z different ISA extensions.

Next you would need some macros to tweak the symbol names for each extension (i.e., foo_sse, foo_avx, foo_avx512, foo_neonv7, foo_neonv8, foo_neonaarch64, etc.), or have a separate shared library for each variant which could then be dlopened (sort of like the how the new hwcaps stuff works).

Then you need a way to choose the best version based on runtime CPU feature detection. cpu_features is a good choice, but last time I looked it requires CMake. I also have some code in portable snippets which could do the job (though it would require a good amount of work first).

Basically, this is a pretty good sized project all on its own. While it would be applicable to SIMDe, it would be by no means unique to SIMDe. I'd be happy to participate, but not as a solo effort.

If this were a normal GSoC year I'd say a PoC may be a good project (with lots of mentor involvement), but since projects are compressed this year it's likely too big of a task for even a relatively skilled student.

nemequ avatar Mar 10 '21 20:03 nemequ

Thanks @nemequ for your response. I think the task can be broken further down to manageable milestones that still provide value along the way.

That means teaching X different build systems about the right flags for Y different compilers for Z different ISA extensions.

I don't think the instructions vary for other ISA extensions. Showing one example for x86 on a single build system would already be a huge help.

Here's a possible breakdown of this project into phases.

  1. Find examples of how current SIMDe using projects accomplish this, and put them in a table that mentions

    1. Their build system
    2. Which approach they took for organizing the compilation units (dynamically loaded shared objects or renamed top-level function)
    3. How they did cpu feature detection
    4. Which architectures (just x86?) they compile multiple versions for
  2. Replicate minimal examples from these that we can test ourselves.

  3. Write a guide for a single build system on how to upgrade an existing codebase using this technique. The guide should should the evolution of a small tool, and all stages should be under our CI. It doesn't have to target all compilers.

  4. Iteratively improve all of the above. Writing new guides for additional build systems. Add different compilers to existing examples/guides.

Phase one and two might fit in a GSoC project (though if we suggest that for this year is another conversation). We can't do just phase one for GSoC because there needs to be code written. Phase 1, 3, and 4 could be part of a Season of Docs project (probably best combined with other documentation needs).

mr-c avatar Mar 11 '21 07:03 mr-c

As discussed in chat, I'm working on this, but here is a very simple version which might help illustrate the concept. The final version will be much harder to follow, so I just wanted to keep this around somewhere:

#include <stdio.h>

void my_function_sse2(void) {
  printf("Hello from SSE2\n");
}

void my_function_avx(void) {
  printf("Hello from AVX\n");
}

typedef struct {
  void (* my_function)(void);
} MyFunctionTable;

static const MyFunctionTable my_function_table_sse2 = {
  my_function_sse2
};

static const MyFunctionTable my_function_table_avx = {
  my_function_avx
};

static const MyFunctionTable*
my_function_table_resolve(void) {
  static const MyFunctionTable* table = NULL;

  if (table == NULL) {
    __builtin_cpu_init();

    if (__builtin_cpu_supports("avx")) {
      table = &my_function_table_avx;
    } else {
      table = &my_function_table_sse2;
    }
  }

  return table;
}

void my_function(void) {
  return my_function_table_resolve()->my_function();
}

int main(void) {
  my_function();

  return 0;
}

nemequ avatar Mar 22 '21 20:03 nemequ