stdarch icon indicating copy to clipboard operation
stdarch copied to clipboard

`stdarch-gen-wasm32`: Tool that creates spec sheet from wasm32's C and Rust source files.

Open madhav-madhusoodanan opened this issue 3 months ago • 2 comments

What does stdarch-gen-wasm32 do?

  1. First it collects the intrinsic definitions from the wasm_simd128.h file (for the definitions in C)
  2. Then it collects the intrinsic definitions from the Rust source files
  3. It extracts details (such as intrinsic name, function arguments, return types, etc) of the C and the Rust intrinsics by decomposing their definitions into their Abstract Syntax Tree (using the tree-sitter crate)
  4. It matches the C and the Rust definitions and creates a spec sheet like the below (for an example intrinsic):
/// u16x8_extract_lane
c-intrinsic-name = wasm_u16x8_extract_lane
c-arguments = __a, __i
c-arguments-data-types = v128_t, int
c-return-type = 
rust-intrinsic-name = u16x8_extract_lane
rust-arguments = a
rust-arguments-data-types = v128
rust-const-generic-arguments = N
rust-const-generic-arguments-data-types = usize
rust-return-type = u16

How to run

cd crates/stdarch-gen-wasm
cargo run -- --c ../../intrinsics_data/wasm_simd128.h --rust ../core_arch/src/wasm32/simd128.rs --rust ../core_arch/src/wasm32/relaxed_simd.rs > wasm32.spec

Context

C Abstract Syntax Tree

Take an intrinsic definition for example:

static __inline__ v128_t __DEFAULT_FN_ATTRS wasm_u32x4_make(uint32_t __c0, uint32_t __c1, uint32_t __c2, uint32_t __c3) {...}

For a C intrinsic, the immediate children would have their grammar names as:

  • storage_class_specifier: which is static
  • storage_class_specifier: which is __inline__
  • identifier: which is v128_t. The parser doesn't recognize that it is a type, instead thinks that it is an identifier.
  • ERROR: which points to the keyword __DEFAULT_FN_ATTRS. The parser doesn't recognize it as a valid part of the tree and annotates it as ERROR.
  • function_declarator: points to wasm_u32x4_make(uint32_t __c0, uint32_t __c1, uint32_t __c2, uint32_t __c3)
  • compound_statement: the body of the function

The immediate children of the function_declarator node would have their grammar as follows:

  • identifier : which is the intrinsic name wasm_u32x4_make
  • parameter_list : which represents the arguments to the intrinsic (uint32_t __c0, uint32_t __c1, uint32_t __c2, uint32_t __c3)

The immediate children of a parameter_list node would have their grammar as follows:

  • ( : The opening bracket that denotes the start of the arguments definition.
  • parameter_declaration : The definition for the first argument uint32_t __c0
  • , : The comma that separates the first and the second arguments.
  • parameter_declaration : The definition for the second argument uint32_t __c1
  • , : The comma that separates the second and the third arguments.
  • parameter_declaration : The definition for the third argument. uint32_t __c2
  • ,* : The comma that separates the third and the fourth arguments.
  • parameter_declaration : The definition for the fourth argument. uint32_t __c3
  • ) : The closing bracket that denotes the end of the arguments definition.

Each node with the grammar name parameter_declaration could have its children structured in a few ways:

  1. In the case of int x:
  • primitive_type : Points to int
  • identifier : Points to x
  1. In the case of v128_t x:
  • identifier : Points to v128_t, which is actually a type (but the parser is unaware of the same).
  • identifier : Points to x.
  1. In the case of const void *__mem:
  • type_qualifier : Points to const.
  • primitive_type: Points to void.
  • pointer_declarator : Breaks down into * and identifier (which is __mem).

Rust Abstract Syntax Tree

Take a Rust intrinsic definition for example:

pub unsafe fn v128_load64_splat(m: *const u64) -> v128 {
    u64x2_splat(ptr::read_unaligned(m))
}

For this Rust intrinsic, the immediate children would have their grammar names as:

  • visibility_modifier: For pub
  • function_modifiers : For unsafe. May not always be present
  • fn : The actual keyword fn
  • identifier : the name of the function v128_load64_splat
  • type_parameters : the const generic arguments. (This is not always present)
  • parameters : The arguments passed to the function (m: *const u64)
  • -> : The arrow used to specify the return type
  • identifier : The return type of the function v128
  • block: The body of the function

The children of the const_parameters node have their grammar_names as the following (assuming 2 generic arguments):

  • <: The opening angle bracket that starts the generic arguments definition
  • const_parameter: The first const generic argument
  • ,: The comma that separates the generic arguments
  • const_parameter: The second const generic argument
  • >: The closing angle bracket that concludes the generic arguments definition

The children of the parameters node have their grammar_names as the following (assuming 2 arguments):

  • (: The opening parenthesis that starts the arguments definition
  • parameter : The first argument
  • ,: The comma that separates the arguments
  • parameter : The second argument
  • ) : The closing parenthesis that concludes the arguments definition

cc: @Amanieu @folkertdev

madhav-madhusoodanan avatar Aug 31 '25 18:08 madhav-madhusoodanan