IL2C
IL2C copied to clipboard
Support closed generic types.
Idea
- Value Witness Table (came from swift)
- Aggregate implementation for generic parameters are objref type.
- How to analyze and fix implementation for partial objref arguments at generic parameters?
Found fist problem at symbol mangling system. Current mangling rules are gonna break easier if will append/change members.
We have to make stable mangling symbols, I'll rewrite the type and method symbol name mangling algorithm.
For example: string string.Format(string format, object arg0)
--> System_String* System_String_Format__System_String_System_Object(System_String* format, System_Object* arg0)
For example (planning closed generic type, not tested): int List<string>.Add(string value)
--> System_Int32 System_Collections_Generic_List__System_String_Add__System_String(System_Collections_Generic_List__System_String* this__ ,System_String* value)
It's very redundant but stable and safe. And I'm planning will fix by append (readable, useful) alias names the final step.
NOTE: It's interesting about how to fit the array types into generic symbol system, I feel the array types understand making better:
int[]
--> System.Array<System.Int32>
--> System_Array__System_Int32
Ofcourse System.Array<T>
isn't real type definition. We can apply with pseudo type internal IL2C metadata system.
MEMOIZED: Higher Kinded Polymorphism / Generics on Generics https://github.com/dotnet/csharplang/issues/339
Today, redesigned method overriding calculation at Center CLR Try development meetup #8 (In japanese).
I'll update CalculateVirtualMethods() and remove overload index related codes.
Hello.
I propose generic implementation idea. if you like, please see the following repository. https://github.com/Sinsjr2/CGenericImpleSample
Implementation idea
- [x] static generic function
- [x] generic class/struct instance method
- [ ] generic class virtual method (Partially completed)
- [ ] generic struct virtual method (Partially completed)
- [ ] OpCodes.Constrained
- [ ] generic class/struct static valiable
Thanks for the sample code. Very interesting implementation.
I see that you use runtime type information to achieve this. We can already get the size (il2c_sizeof()
). As for copying the value, there are many possibilities, such as in the case of an objref or when the valuetype contains an objref, but I think it is possible to go this way in the case of a static method.
My idea at the moment (not clearly formed) is to use C macros for the expansion. The disadvantage of this method is that it could generate a large amount of the same code. When I thought of this method, I was thinking of relying on VC optimizations (which can remove identical code at the binary level when linking. See /OPT:ICF). Now that I am planning to pull VC out of priority support, we are wondering if we can do the same thing with gcc or clang instead.
The other problem is that readability will be poor, and I'm not sure I can go any further without resorting to a C++ template. (I have no plans to go literal "IL2C++", the C++ compiler is too slow :)
If you think you can fill in the rest of your idea, you could try applying it directly to IL2C.Core. I am currently planning to work on Release 0.5, and the rest of the work will mainly be to improve the build environment and fix the documentation. Therefore, I do not plan to do much work on IL2C.Core for a while. (The Core unit test code will be significantly modified in relation to #100.)
(This does not mean that I want to include your code in Release 0.5. I'm not in a hurry, so take it easy on me ;)
Thank you for your reply.
I see that you use runtime type information to achieve this. I can already get the size (il2c_sizeof()). As for copying the value, I think it is possible to go for a static method, given the details, like in the case of an objref or when the valuetype contains an objref. I think I can go for the static method.
it mens changing from "TypeInfo" to "IL2C_RUNTIME_TYPE", and "IL2C_RUNTIME_TYPE" can get from "il2c_get_header__".
/* System_Object* */void* obj;
IL2C_RUNTIME_TYPE generic_T = il2c_get_header__(obj)->type;
void Extensions_GenericPassThrough_T(IL2C_RUNTIME_TYPE generic_T, void *result, void *x) {
// .locals init (
// [0] !!T
// )
void *local_0;
void *stack_0;
void *stack_1;
uint32_t runtimeSize_T;
runtimeSize_T = il2c_sizeof__(generic_T)
local_0 = NULL;
stack_0 = alloca(runtimeSize_T);
// IL_0000: nop
// IL_0001: ldarg.0
memcpy(stack_0, genericArg_x, runtimeSize_T);
// IL_0002: stloc.0
memcpy(local_0, stack_0, runtimeSize_T);
// IL_0003: br.s IL_0005
// IL_0005: ldloc.0
memcpy(stack_1, local_0, runtimeSize_T);
// IL_0006: ret
memcpy(result, stack_1, runtimeSize_T);
}
void Extensions_GenericPassThroughTest() {
// .locals init (
// [0] int32 a
// )
System_Int32 a_System_Int32;
System_Int32 stack_0_0;
// IL_0000: nop
// IL_0001: ldc.i4.s 10
stack_0_0 = 10;
// IL_0003: call !!0 Extensions::GenericPassThrough<int32>(!!0)
Extensions_GenericPassThrough_T(il2c_typeof(System_Int32), &stack_0_0, &stack_0_0);
// IL_0008: stloc.0
a_System_Int32 = stack_0_0;
// IL_0009: ret
}
I'm planning follows.
- Translate above "Implementation idea" manyally. Because, it's easier than tlanslation code with IL2C.
- Check manually translated code, to prevent obvious mistakes in implementation policy.
- Write code in IL2C.Core and check with unit test code.
(Sorry it's a bit long. Since you seemed to be Japanese, I'll put the manuscript I wrote in Japanese on gist. You can reply there, but it would be helpful if you could also add the English translation here so that others can refer to it. deepl is also fine :)
That code is fine for how to get runtime type information. (Perhaps you should define a macro in il2c.h.)
I still don't understand all the sample code you wrote, but to get the member offsets of the structure, you can do the following.
- When you run the unit tests, the partially translated test code will be output in the
test-artifacts
directory for your reference. - You can refer to the
MultipleInsideValueType
type in theGarbageCollection
ofIL2C.Tests.RuntimeSystems
as an example structure. This type is defined in C# as follows:
public struct MultipleInsideValueTypeType
{
public string Value1;
public ObjRefInsideValueTypeType Value2;
public ObjRefInsideObjRefType Value3;
public MultipleInsideValueTypeType(string value1, string value2, string value3)
{
this.Value1 = value1;
this.Value2 = new ObjRefInsideValueTypeType(value2);
this.Value3 = new ObjRefInsideObjRefType(value3);
}
}
- Output under
test-artifacts/Debug/net48/RuntimeSystems/GarbageCollection/MultipleInsideValueType_0/
. - If you look at the end of
MultipleInsideValueTypeType.c
, you will find the following code:
//////////////////////
// [7] Runtime helpers:
// [7-10-1] VTable (Not defined, same as System.ValueType)
// [7-8] Runtime type information
IL2C_RUNTIME_TYPE_BEGIN(IL2C_RuntimeSystems_MultipleInsideValueTypeType, "IL2C.RuntimeSystems.MultipleInsideValueTypeType", IL2C_TYPE_VALUE, sizeof(IL2C_RuntimeSystems_MultipleInsideValueTypeType), System_ValueType, 3, 0)
IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_REFERENCE(IL2C_RuntimeSystems_MultipleInsideValueTypeType, Value1)
IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_VALUE(IL2C_RuntimeSystems_MultipleInsideValueTypeType, IL2C_RuntimeSystems_ObjRefInsideValueTypeType, Value2)
IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_REFERENCE(IL2C_RuntimeSystems_MultipleInsideValueTypeType, Value3)
IL2C_RUNTIME_TYPE_END();
This is a macro that defines runtime type information, with three lines of definitions IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_REFERENCE()
and IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_VALUE()
, respectively, for the objref and valuetype fields are defined.
For example, look at the definition of IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_VALUE()
(il2c.h
):
#define IL2C_RUNTIME_TYPE_MARK_TARGET_FOR_VALUE(typeName, fieldTypeName, fieldName) \
(uintptr_t)il2c_typeof(fieldTypeName), \
offsetof(typeName, fieldName),
which corresponds to markTargets[]
in IL2C_RUNTIME_TYPE_DECL
(il2c_private.h
):
typedef const struct IL2C_MARK_TARGET_DECL
{
const IL2C_RUNTIME_TYPE valueType;
const uintptr_t offset;
} IL2C_MARK_TARGET;
struct IL2C_RUNTIME_TYPE_DECL
{
const char* pTypeName;
const uintptr_t flags;
const uintptr_t bodySize; // uint32_t
const IL2C_RUNTIME_TYPE baseType;
const void* vptr0;
const uintptr_t markTarget; // mark target count / custom mark handler (only variable type)
const uintptr_t interfaceCount;
//IL2C_MARK_TARGET markTargets[markTarget];
//IL2C_IMPLEMENTED_INTERFACE interfaces[interfaceCount];
};
In other words, code like IL2C_RUNTIME_TYPE->markTargets[index].offset
will give you the offset of the structure member.
For now, IL2C uses this information only to track the garbage collector, but I have a feeling it could be used for this method as well.
- You can refer to the
il2c_mark_handler_recursive__()
area for the specific formula.
Since this calculation was also a very internal information of IL2C, I did not specifically define a macro for this calculation, but if necessary, you may define a macro.
Now, besides the performance issues with memcpy()
, you need to be careful whether you can do pure copying or not.
- If it is valuetype, no problem. (If you include an objref, you need to be able to track it, so you need to insert start and end codes that bind the
EXECUTION_FRAME
. This can be considered later.) - If
IL2C_RUNTIME_TYPE
points to an objref, copying it frompReference
does not mean you copied it correctly. This is becauseIL2C_REF_HEADER
is placed beforepReference
(at a negative offset):
// +----------------------+ <-- pHeader
// | IL2C_REF_HEADER |
// +----------------------+ <-- pReference -------
// | : | ^
// | (Instance body) | | bodySize
// | : | v
// +----------------------+ -------
I still don't understand the need for the copy, but I have a feeling that the way to handle this depends on why the copy is needed.
As you see,As you see, I'm Japanese. I will also write English for many peple can read this.
Generic T GC implementation
Thank you for your description of GC mark method.
I simply describe generic T GC implementation. The problem of implementing GC for generic T is to change objref and value type dynamically.
So, as follows I assign generic_T
to x_type__
,
when gc run, it determines whether x_value_ptr__
is objref
or value type
,
switch the function to call.
https://github.com/kekyo/IL2C/blob/4c3b4097de29f119a01e9b4499d319eca773003e/IL2C.Runtime/src/Core/il2c_gc.c#L256-L270
objref: il2c_mark_handler_for_objref__(*(System_Object**)x_value_ptr__) value type (existing implementation):
il2c_mark_handler_recursive__(pAdjustedReference, pHeader->type, offset);`
Case of Generic T is Objref
If type of T is objref, it doesn't copy class fields.
It copies pointer itself with memcpy
.
- If it is valuetype, no problem. (If you include an objref, you need to be able to track it, so you need to insert start and end codes that bind the
EXECUTION_FRAME
. This can be considered later.)- If
IL2C_RUNTIME_TYPE
points to an objref, copying it frompReference
does not mean you copied it correctly. This is becauseIL2C_REF_HEADER
is placed beforepReference
(at a negative offset):
Case of assign
System_Object* x;
System_Object* local_0;
local_0 = x;
Case of memcpy
System_Object* x;
System_Object** arg_x;
System_Object** local_0;
arg_x = &x;
local_0 = alloca(sizeof(System_Object*));
memcpy(local_0, arg_x, sizeof(System_Object*))
OutputCode
typedef struct Extensions_GenericPassThroughTest_EXECUTION_FRAME_DECL
{
const IL2C_EXECUTION_FRAME* pNext__;
const uint16_t objRefCount__;
const uint16_t valueCount__;
//-------------------- objref
//-------------------- value type
const IL2C_RUNTIME_TYPE x_type__; // generic type
const void* x_value_ptr__;
const IL2C_RUNTIME_TYPE local_0_type__; // generic type
const void* local_0_value_ptr__;
const IL2C_RUNTIME_TYPE local_1_type__; // generic type
const void* local_1_value_ptr__;
const IL2C_RUNTIME_TYPE local_2_type__; // generic type
const void* local_2_value_ptr__;
} Extensions_GenericPassThroughTest_T_EXECUTION_FRAME__;
void Extensions_GenericPassThrough_T(IL2C_RUNTIME_TYPE generic_T, void *result, void *x) {
// .locals init (
// [0] !!T
// )
uint32_t runtimeSize_T;
runtimeSize_T = il2c_sizeof__(generic_T);
Extensions_GenericPassThroughTest_T_EXECUTION_FRAME__ frame = {
...
generic_T, // x
alloca(runtimeSize_T),
generic_T, // local_0
alloca(runtimeSize_T),
generic_T, // local_1
alloca(runtimeSize_T),
generic_T,// local_2
alloca(runtimeSize_T)
};
// IL_0001: ldarg.0
// T is value type: copy member filelds to new instance
// T is object reference type: copy pointer to new local variable with memcpy.
// so, this is not Object.MemberwiseClone https://docs.microsoft.com/ja-jp/dotnet/api/system.object.memberwiseclone?view=net-6.0
memcpy(frame.stack_0, x, runtimeSize_T);
...
}
void Extensions_GenericPassThroughTestObj() {
// .locals init (
// [0] object a
// )
// IL_0000: nop
System_Object* a_System_Object;
System_Object* stack_0_0;
// IL_0001: newobj instance void [System.Runtime]System.Object::.ctor()
stack_0_0 = il2c_get_uninitialized_object(System_Object);
System_Object__ctor(stack_0_0);
// IL_0006: call !!0 C::GenericPassThrough<object>(!!0)
// pass pointer of pointer
Extensions_GenericPassThrough_T(il2c_typeof(System_Object), &stack_0_0, &stack_0_0);
// IL_000b: stloc.0
a_System_Object = stack_0_0;
// IL_000c: ret
}
Hold a field in the execution frame with a raw pointer to the instance x_value_ptr__
(which may point to a pointer to an objref, or to the body of a valuetype) and the runtime type information x_type__
:
- If
T
is an objref, then :- treat
x_value_ptr__
as if it were aSystem_Object*
(reinterpret_cast). - Let GC traverse the reference tracking as it is (implement it in
il2c_gc.c
as a handler for the third variable element, or use [pReference
in the execution frame](https://github.com/kekyo/IL2C/blob/4c3b 4097de29f119a01e9b4499d319eca773003e/IL2C.Runtime/src/il2c_private.h#L77) to handle it well... There seems to be a trade-off between footprint and readability.
- treat
- If
T
is a valuetype, then :- Treat
x_value_ptr__
like a pointer to the target value type body. - When accessing runtime type information with
box
opcode and etc., refer tox_type__
. - If GC reference tracing is required (IsRequiredTraverse) put it in
valueDescriptors__
of Execution frame, or not if you don't need it...? Might be better to create some helper function and have it do it in there?)
- Treat
Maybe your initial concern about using memcpy
can be offset by optimizations in the C compiler. At least when I verified it with optimization enabled in VC++ before, it generated exactly the same code with memcpy
and assignment expressions in C language. Of course, I suppose it depends on the conditions...
Generic type argument constraints
We haven't examined instance member access yet, but accesses like System.Object.ToString()
for T
:
public static string foo<T>(T value) =>
value.ToString();
Or access with T
constraint:
public static string bar<T>(T value)
when T : IDisposable =>
value.Dispose();
Assuming a managed compiler like C# has (correctly) computed the constraints, IL2C simply casts the pointer (reinterpret_cast to the VTABLE layout type of System.Object.ToString
's VTABLE or IDisposable
's VTABLE) might be able to access it.
In the case of interfaces, we need to calculate adjustor offset, but if we can determine which interface the specified member (Dispose()
) belongs to by cecil from T
, I think it would be possible to convert the code to calculate adjustor offset statically.
I have come up with a conversion process for the following process and report it below.
- generic static variable
- generic virtual function
- generic class/struct fields
Now that I have a rough idea of how to implement generics using memcpy in handwritten C code, I would like to think about the details while actually implementing it in IL2C (output in C89).
First, I will try to implement it for value types that do not require gc in static methods.
I don't fully understand how the process is divided by objref
value
when tracking with gc, so I will think about it later.
As for memcpy
optimization, I'm not that worried about it in major compilers (gcc, clang) including msvc.
I am a little worried about how far the compiler for microcontrollers (cc-rx, cc-rl, rx gcc) will optimize it.
However, it is no use thinking about it before implementation, so I will think about it after implementation is done.
https://github.com/Sinsjr2/CGenericImpleSample/blob/29a64b9642910ed5cf67d0cc4f332d92f3b02af5/README.md
In the generic implementation, I need to add result
and generic_T
to the function arguments and
In current mangling process, it has a possibility of name conflicts.
class C
{
static void F<T>(int result, int generic_T) {}
}
Current conversion process
void C_F_T(void* result, IL2C_RUNTIME_TYPE generic, int result, int generic_T) {}
So, I will try to escape strings used for type and variable names with the following rules. The following process is reversible, so unescape is possible and names will not conflict.
Escaping rules
. => __ // Currently converts to _, but that does not treat _ as an escape character Existing. IL2C.Runtime needs to be modified.
_ => _i_ // _ is often used to separate and discard values i is a vertical bar, so it is easy to see that it is separated
[a-zA-Z0-9] => no conversion
After reserved (ex: this, frame) converted by IL2C => suffix with _sr_ (ex: this_sr_, frame_sr_, generic_T_sr_, result_sr_)
(_sr_) stands for System Reserved
Other characters => convert to _ux○○○○○○○○ (8-digit hexadecimal UTF-32)
ux stands for unicode hex
IL can use unicode characters such as Kanji and Japanese as identifiers, so conversion is necessary (most systems can use only Ascii characters and _ in c language. Some systems can use universal character names).
< => _d_ // For using generic, this notation can easily be written in c if written by hand.
> => _b_ // For using generic, this notation can easily be written in c if written by hand.
Local variables used in methods (@if, malloc, @void, NULL) c# can describe reserved words by adding @) => add _l_ as suffix (ex: if_l_, malloc_l_, void_l_, NULL_l_)
If you don't add anything, it will be expanded as a macro or conflict with C reserved words, resulting in a compile error.
I plan to make the above fixes separately from the generic implementation, but may I implement them?
Sorry for the late reply.
-
How do you give the value of alignment for il2c_adjustAlignment?
- I'm thinking the safe thing to do would be to
#define
it in a platform specific header file, but - It would be better if we could have the compiler calculate it (I haven't come up with the specifics, but maybe have it use
offsetof()
...) -
size_t
is quite hard to use on some platforms, souint8_t
might be better, assuming that the alignment never exceeds 256 (even in IL2C, there are some places where we have compromised and stopped usingsize_t
).
- I'm thinking the safe thing to do would be to
-
I think we need some kind of generic dictionary function...
- It's hard to add it to the runtime library to increase the footprint, but I guess it can't be helped.
- I wonder if it would be exempt if we didn't use generic types, if they weren't linked...
However, I can't get IL2C_RUNTIME_TYPE_DECL from this when it comes to regular function calls. Therefore, we separate the function for virtual calls and the function for normal calls.
Yes indeed, in the case of value type, I have no problem without vptr (pReference) (this is in consideration of allowing direct pointer references on the native side during interop)
In the generic implementation, I need to add result and generic_T to the function arguments, and With the current mangling process, there is a possibility of name conflicts.
Noted :)
In particular, we need to add a . You are absolutely right about the conversion of
into _
, which is problematic even when generic type arguments are not involved.
- When I looked into it before, something like escaping Unicode points (
\x1234
) is not available in the preprocessor macro. - There is no stable special character other than
_
as a preprocessor macro symbol (although there may be one on some processors).
So, I was holding off.
With the method you suggest, I would have to modify the translator as well as the existing runtime implementation. Until we make this modification, we should try to increase the runtime implementation as little as possible.
After reserved (ex: this, frame) converted by IL2C => suffix with sr (ex: this_sr_, frame_sr_, generic_T_sr_, result_sr_)
I think it's good (I was thinking it might be better to make it dirtier for the current method).
Other characters => convert to _ux○○○○○○○○ (8-digit hexadecimal UTF-32)
Do you want to use UTF-8? There are readability issues, but realistically I don't think there's much use of CJK or anything like that, just enough to be a target if umlauts or something like that is used. Although not a symbol, you might want to keep in mind bug #124 that we recently picked up, if you haven't seen it yet. I mistakenly put in wchar_t
thinking it was 16-bit.
- The translator side cannot (maybe) use string literals.
uint16_t str[] = { ... };
to output the raw value, orchar str[] = "..." ;
to put in UTF-8 I believe. - If you decide to use UTF-8 in the above, you will need to modify the implementation at runtime, especially around
System_String
.
< => d // For using generic, this notation can easily be written in c if written by hand.
=> b // For using generic, this notation can easily be written in c if written by hand. Local variables used in methods (@if, malloc, @void, NULL) c# can describe reserved words by adding @) => add l as suffix (ex: if_l_, malloc_l_, void_l_, NULL_l_)
I think it is good.
Or maybe it would be better to have it macro-expanded (though I'm not a bit sure if it would contribute to readability). For example :
// System.Collections.Generic.List<System.Int32>
// this won't work.
#define GENERIC_ARG(args) _d_##args%%_b_
System__Collections__Generic GENERIC_ARG(System__Int32)
// not so good...
#define GENERIC_TYPE(type, args) type##_d_##args%%_b_
GENERIC_TYPE(System__Collections__Generic, System__Int32)
like ?