`stdarch-gen-wasm32`: Tool that creates spec sheet from wasm32's C and Rust source files.
What does stdarch-gen-wasm32 do?
- First it collects the intrinsic definitions from the
wasm_simd128.hfile (for the definitions in C) - Then it collects the intrinsic definitions from the Rust source files
- It extracts details (such as intrinsic name, function arguments, return types, etc) of the C and the Rust intrinsics by decomposing their definitions into their Abstract Syntax Tree (using the
tree-sittercrate) - It matches the C and the Rust definitions and creates a spec sheet like the below (for an example intrinsic):
/// u16x8_extract_lane
c-intrinsic-name = wasm_u16x8_extract_lane
c-arguments = __a, __i
c-arguments-data-types = v128_t, int
c-return-type =
rust-intrinsic-name = u16x8_extract_lane
rust-arguments = a
rust-arguments-data-types = v128
rust-const-generic-arguments = N
rust-const-generic-arguments-data-types = usize
rust-return-type = u16
How to run
cd crates/stdarch-gen-wasm
cargo run -- --c ../../intrinsics_data/wasm_simd128.h --rust ../core_arch/src/wasm32/simd128.rs --rust ../core_arch/src/wasm32/relaxed_simd.rs > wasm32.spec
Context
C Abstract Syntax Tree
Take an intrinsic definition for example:
static __inline__ v128_t __DEFAULT_FN_ATTRS wasm_u32x4_make(uint32_t __c0, uint32_t __c1, uint32_t __c2, uint32_t __c3) {...}
For a C intrinsic, the immediate children would have their grammar names as:
- storage_class_specifier: which is
static - storage_class_specifier: which is
__inline__ - identifier: which is v128_t. The parser doesn't recognize that it is a type, instead thinks that it is an identifier.
- ERROR: which points to the keyword
__DEFAULT_FN_ATTRS. The parser doesn't recognize it as a valid part of the tree and annotates it as ERROR. - function_declarator: points to
wasm_u32x4_make(uint32_t __c0, uint32_t __c1, uint32_t __c2, uint32_t __c3) - compound_statement: the body of the function
The immediate children of the function_declarator node would have their grammar as follows:
- identifier : which is the intrinsic name
wasm_u32x4_make - parameter_list : which represents the arguments to the intrinsic
(uint32_t __c0, uint32_t __c1, uint32_t __c2, uint32_t __c3)
The immediate children of a parameter_list node would have their grammar as follows:
- ( : The opening bracket that denotes the start of the arguments definition.
- parameter_declaration : The definition for the first argument
uint32_t __c0 - , : The comma that separates the first and the second arguments.
- parameter_declaration : The definition for the second argument
uint32_t __c1 - , : The comma that separates the second and the third arguments.
- parameter_declaration : The definition for the third argument.
uint32_t __c2 - ,* : The comma that separates the third and the fourth arguments.
- parameter_declaration : The definition for the fourth argument.
uint32_t __c3 - ) : The closing bracket that denotes the end of the arguments definition.
Each node with the grammar name parameter_declaration could have its children structured in a few ways:
- In the case of
int x:
- primitive_type : Points to int
- identifier : Points to x
- In the case of v128_t x:
- identifier : Points to
v128_t, which is actually a type (but the parser is unaware of the same). - identifier : Points to
x.
- In the case of const void *__mem:
- type_qualifier : Points to const.
- primitive_type: Points to void.
- pointer_declarator : Breaks down into
*and identifier (which is__mem).
Rust Abstract Syntax Tree
Take a Rust intrinsic definition for example:
pub unsafe fn v128_load64_splat(m: *const u64) -> v128 {
u64x2_splat(ptr::read_unaligned(m))
}
For this Rust intrinsic, the immediate children would have their grammar names as:
- visibility_modifier: For
pub - function_modifiers : For
unsafe. May not always be present - fn : The actual keyword
fn - identifier : the name of the function
v128_load64_splat - type_parameters : the
constgeneric arguments. (This is not always present) - parameters : The arguments passed to the function
(m: *const u64) - -> : The arrow used to specify the return type
- identifier : The return type of the function
v128 - block: The body of the function
The children of the const_parameters node have their grammar_names as the following (assuming 2 generic arguments):
- <: The opening angle bracket that starts the generic arguments definition
- const_parameter: The first
constgeneric argument - ,: The comma that separates the generic arguments
- const_parameter: The second
constgeneric argument - >: The closing angle bracket that concludes the generic arguments definition
The children of the parameters node have their grammar_names as the following (assuming 2 arguments):
- (: The opening parenthesis that starts the arguments definition
- parameter : The first argument
- ,: The comma that separates the arguments
- parameter : The second argument
- ) : The closing parenthesis that concludes the arguments definition
cc: @Amanieu @folkertdev