haxe
haxe copied to clipboard
Array<T> variance on static targets
To avoid fragmentation I'm consolidating multiple issues here that likely have the same cause and solution: #2103 #2579 #3429 #4187 #4340 #4424 #4783
Edit NC, added #5799
I have studied quite in detail the problem of variance with Arrays, and the particular case of Array<Dynamic>, as part of my work on HL.
I came up with the following design that I would like to be generalized on all "static" targets that have type-specific arrays (usually for basic types such as Array<Float>), so that would concern C++,C#,Java, and HL.
I define several Array classes implementations:
- ArrayAccess (not the haxe one) is the base class for all arrays. It only has two methods : getDyn() and setDyn() operating on dynamics. Not even length (we will come to that afterwards). It used when doing array accesses on a Dynamic value.
- ArrayBase is the base class for all arrays implementations EXCEPT ArrayDyn. Actually ArrayDyn will just provide a wrapper for an actual ArrayBase implementation. ArrayBase provides more operations such as pushDyn(), slice, etc. that can be called by ArrayDyn
- Each specific Array then implements ArrayBase. In HL I have
ArrayBasic<Float>,ArrayBasic<Int>which uses@:generic, then ArrayObj which contains any kind of "pointer" object. They implement both the "classic" Haxe Array API and some extra*Dyn()functions that comes from ArrayBase - Finally, ArrayDyn is the haxe
Array<Dynamic>implementation, which operates on an underlying ArrayBase.
The trick is the following:
When creating a new ArrayDyn, we have an extra flag in it which is named allowReinterpret, which means that when we cast such ArrayDyn to an Array<Int> for instance, we will first reinterpret the underlying ArrayObj to an ArrayBasic<Int> before returning it, then set allowReinterpret to false, which means that after the first cast to a known type, the Array type becomes fixed, which prevent copying the whole array on each of such casts.
I think this cover all the problematic cases (JSON, Serialization, etc.)
This requires two kind of runtime support/emulation:
- casting between Array types needs to be implemented/detected. I'm doing that in a dynamic fashion in HL by having an extra
__castmethod - accessing ArrayDyn.length needs to return the length of the underlying Array, even when typed as Dynamic. I'm implementing it using
__get_fieldin HL. That's the reason length is not in ArrayAccess.
The full source code for HL solution is available here: https://github.com/HaxeFoundation/haxe/tree/hl/std/hl/types (ArrayBase, ArrayObj, ArrayDyn)
I would like to hear about @hughsando and @waneck comment on this, then if it's fine with both of you, I can write unit tests for it so we make sure that this behavior is correctly standardized on all static platforms.
PS: one last thing I haven't yet worked on is referential equality : we also want our ArrayDyn == its underlying ArrayBase.
I have implemented a similar thing with "VirtualArray" The array starts off unknown, and then as you add stuff to it, the underlying representation changes until you ultimately cast it, and then it becomes fixed. So if you add a bunch of ints, it is an int array. If you then add a float, it becomes a float array. If you add a null, it becomes a Dynamic array. This should help by avoiding boxing in a lot of cases. You can try this in hxcpp target by swapping the commenting of these likes: "cpp::ArrayBase" ^ suffix (* "cpp::VirtualArray" ^ suffix *) (side note - can have a global variable to switch this in ocaml, or do I need to pass a context everywhere?)
It seems to work ok, but I have not got it working in cppia.
On Tue, Feb 23, 2016 at 7:59 PM, Nicolas Cannasse [email protected] wrote:
PS: one last thing I haven't yet worked on is referential equality : we also want our ArrayDyn == its underlying Array.
— Reply to this email directly or view it on GitHub https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-187671417 .
@hughsando the problem with this approach is that your array keep changing its underlying type, for instance if you add two floats, then cast to Array<Int>, then add floats again, what is the behavior ? My proposal is that the array is either in an "unkown state" (dynamic) or in a known state, and once it is it cannot change anymore. And since the first cast to an not-Array-Dynamic will make it a known state, it guarantee that you never happen to have several copies of the underlying array which does not preserve side effects.
The casting only ever happens in one direction: empty -> Int -> Float -> Dynamic (maybe skip Float or Int), or empty -> String -> Dynamic or empty -> Bool -> Dynamic. or just empty -> Dynamic. As soon as you cast, it locks it in forever (like your solution). The good thing about this is that the final cast (eg, to Array<Int>) is usually a no-op if you have only added ints - ie, zero changes. If you add a whole lot of ints, a single float and then an object, you could get 2 transitions, where perhaps none was required. But it is more likely to be a homogeneous array, so I would think that this rarer than the "not need to do anything" case. Adding two floats and then casting to Ints shows a breakdown of the compiler logic so throwing an error might be appropriate, but currently it will perform Float->Int and lock it in.
On Wed, Feb 24, 2016 at 4:01 AM, Nicolas Cannasse [email protected] wrote:
@hughsando https://github.com/hughsando the problem with this approach is that your array keep changing its underlying type, for instance if you add two floats, then cast to Array<Int>, then add floats again, what is the behavior ? My proposal is that the array is either in an "unkown state" (dynamic) or in a known state, and once it is it cannot change anymore. And since the first cast to an not-Array-Dynamic will make it a known state, it guarantee that you never happen to have several copies of the underlying array which does not preserve side effects.
— Reply to this email directly or view it on GitHub https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-187878494 .
Ok, that seems good to me, I could indeed optimize HL this way as well, it's not something really detectable by the enduser anyway.
Any reason it has not been actived by default in hxcpp yet ?
Also, still waiting to hear from @waneck :)
Testing and cppia mainly - I wrote it, but did not have time to put out fires if it went wrong. Now I've moved over to "haxe mode", I will try to git it in in the next week or two.
Hugh
On Wed, Feb 24, 2016 at 8:49 PM, Nicolas Cannasse [email protected] wrote:
Ok, that seems good to me, I could indeed optimize HL this way as well, it's not something really detectable by the enduser anyway.
Any reason it has not been actived by default in hxcpp yet ?
Also, still waiting to hear from @waneck https://github.com/waneck :)
— Reply to this email directly or view it on GitHub https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-188240459 .
I just wanted to start adding tests for this but didn't get very far:
class Main {
static function main() {
var a:Array<Dynamic> = [1, 2, 3];
trace("I'm just here to prevent the analyzer fusion");
var b:Array<Int> = cast a;
trace("Yeah me too");
trace(a == b); // false
}
}
@hughsando: Wasn't this supposed to work now?
Looks like it is an issue with "op==". If you do: b.push(4); trace(a); you get 1,2,3,4
I added the operators to hxcpp.
That helped with the direct ==, but it still fails if we hide the arrays behind type parameters:
class Main {
static function main() {
var a:Array<Dynamic> = [1, 2, 3];
trace("I'm just here to prevent the analyzer fusion");
var b:Array<Int> = cast a;
trace("Yeah me too");
eq(a, b);
}
static function eq<T>(e1:T, e2:T) trace(e1 == e2); // false
}
P.S @ncannasse: That fails on HL too, even in the interpreted version.
There's not much of a point in me adding a test that fails on multiple targets. I'll assign this issue to the 3.3.0 final milestone so we can see what we want to do here after the RC.
@Simn which targets apart hxcpp are concerned ? Java and C# I presume ?
@hughsando could you look into it for RC? I think this kind of non-equality could break people code when trying RC.
I think hxcpp is fine now actually, so it's probably Java, C# and HL.
Please submit the unit tests with #if !(hl || java || cs) for the parts that fail for now, and open a separate issue for these.
I don't like these #if tests, it's too easy to forget about them. I've added it without the conditional compilation to a separate branch so we can forget about that instead.
As expected the test fails on C# and HL: https://travis-ci.org/Simn/haxe/builds/120320083
I'm assigning to @waneck because HL is not ready for 3.3 anyway
I won't be able to get this done for 3.3. Pushing to 3.4
Moved from https://github.com/HaxeFoundation/haxe/issues/5467 .
It looks like C# implementation copying all fields of original Array. (Using reflection for some reason). Doesn't that cause more issues? That's slow, and newly created array no longer refers to the same array. I don't want Haxe to copy Array in that case.
Yes, this is #2103.
Is reflection always needed to achieve this? I don't know where reflection calls came from. Are they automatically generated by Haxe?
I think hxcpp is fine now actually, so it's probably Java, C# and HL.
@Simn Both of the test cases you posted here still trace false for me on hxcpp (Haxe 3.4.0 and hxcpp 3.4.49). Or was it fixed at some point and then broken again?
I have checked hxcpp again and both tests work there now. I also made sure that the analyzer doesn't inadvertently fix it.
I didn't test HL because all I ever get out of that these days is "Unsupported bytecode version". It's definitely still broken on Java and C# though.
My CI says this fails on C# and PHP of all things.
I confirmed HL to still be failing here as well.
I have fixed HL support for this specific test in be1a102a910ec8446dce8c325dc23aeb9a768033
This might break some runtime wrt Array<Dynamic> handlings, we need proper unit testing for Haxe 4.0, we will discuss if we can make java/c# compliant in time but we need to pass the tests on other platforms.
@RealyUniqueName Could you look at the PHP failure?
class Main {
@:analyzer(ignore)
static public function main() {
var a:Array<Dynamic> = [1, 2, 3];
var b:Array<Int> = cast a;
b.push(4);
trace(a[3]); // null
}
}
Note that the @:analyzer(ignore) is required to reproduce this.
Java shouldn't have any issue like that.
As for C#, my real issue with implementing this is how should we deal with user-created type parameters? The possible solutions I can think are:
- We don't generate type parameters for hxGen C# types (e.g. make
erase_genericsthe default) - We keep our old behavior, so that Array is a special case
- We try to generalize the way we handle Array so it works for any generic class
I can't think of a way to generalize the Array handling, but I'd be interested to hear ideas. As for between 1 and 2, I'm also not sure what's best. Does anyone have strong opinions regarding them?
What problems does erase_generics cause?
I think it's mostly about performance and generated code compatibility with native C# code
Arrays are special - in particular Array<Dynamic> is the one thing the the whole haxe type system that is inconsistent. I think if the above was "var a:Array = [1,2,3]" it would make everything consistent and logical for developers and backend writers Or maybe "IArray" interface. The solution hxcpp (and hl?) uses is to use an extra level of indirection for Array<Dynamic> - this punishes Arrays-of-actual-Dynamic because there is not way to tell it apart from the "magical array that can be an array-of-int" shown here.
On Wed, Jun 6, 2018 at 1:36 AM, Cauê Waneck [email protected] wrote:
Java shouldn't have any issue like that.
As for C#, my real issue with implementing this is how should we deal with user-created type parameters? The possible solutions I can think are:
- We don't generate type parameters for hxGen C# types (e.g. make erase_generics the default)
- We keep our old behavior, so that Array is a special case
- We try to generalize the way we handle Array so it works for any generic class
I can't think of a way to generalize the Array handling, but I'd be interested to hear ideas. As for between 1 and 2, I'm also not sure what's best. Does anyone have strong opinions regarding them?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-394796613, or mute the thread https://github.com/notifications/unsubscribe-auth/ABlp1lM3ircLKQDruzIvat-8RN-xTQjzks5t5sG0gaJpZM4Hgg8I .
@RealyUniqueName Could you look at the PHP failure?
That's because Array.push() is inlined in std of php backend.
That push is generated as:
$b = $a;
$this1 = $b->arr;
$idx = $b->length;
$this1[$idx] = 4;
++$b->length;
And $b->arr is a native PHP array. And native arrays are passed by copy, not by reference. I think I can fix this in std without removing inline.
But this is not ok to automatically introduce temp vars for accessing object fields. I guess it also affects native structs on C# target.
I've updated the branch again:
- PHP:
src/unit/TestArray.hx:12: 4 should be null - HL:
src/unit/TestArray.hx:9: characters 5-11 : Don't know how to compare #hl.types.ArrayDyn and #hl.types.ArrayBytes_Int - C#:
src/unit/TestArray.hx:9: false should be true src/unit/TestArray.hx:10: [1,2,3] should be [1,2,3,4] src/unit/TestArray.hx:11: 4 should be 3 src/unit/TestArray.hx:12: 4 should be 0
Can you please point me to the branch? Can't find it )
Sure, I keep these branches on my own GitHub because that tends to be better for CI: https://github.com/Simn/haxe/tree/array_variance_tests
PHP fails because of the same problem described in my previous comment: https://github.com/HaxeFoundation/haxe/issues/4872#issuecomment-394954238
Yes but is that for me or for you to fix? If it's a problem with temp vars we have to check if it can be reproduced manually (without inline).
I think it's for you to fix. Because those temp vars will case the same problem for any field of non-scalar value-type.
Then we have to mark the affected types with some metadata. Though I don't really like having to support copy-semantics just for this.
Note that this example also doesn't behave like the syntax would suggest:
class Main {
@:analyzer(ignore)
static public function main() {
var a = [1, 2, 3];
var b = @:privateAccess a.arr;
b[3] = 4;
trace(a[3]); // null
}
}
Note that this example also doesn't behave like the syntax would suggest
This is a defined behavior for php.NativeArray so it's up to the user to decide what to do with it.
On the other hand inst.inlinedMethod() temp vars generation is out of the user's control, so it should behave the same way with and without inlining.
This would mean that for certain types we never create a temp var. This is quite problematic for various cases I can think of.
IMO this has to be approached differently, e.g. by not using a field access expression on the native array type.
Moved to 4.1 because it's a rather involved project.
Any motion on this for the exciting 4.1 release? I'm hoping https://github.com/HaxeFoundation/haxe/issues/8314 gets cleared.
This won't be resolved for 4.1. Which target are you concerned about?
C# We started using haxe 4 in preview 3. We liked the syntax so we started incorporating it as we weren't going to release for a year. We figured nothing major would change. Now our Server side (C#) is compiling out of rc2 and our client side (JS) is compiling out of 4.0.5
@kLabz What do you think about this? Any workaround you can recommend for #8314? I really don't want people to be stuck on RC versions.
I don't think there's much that can be done except fix this issue.
The same kind of functionalities can be achieved, using c# utils (so not in a cross-target way), and without Array<T> (so it requires some rewrite):
import cs.system.collections.generic.List_1;
import newtonsoft.json.JsonConvert;
@:publicFields
class DataItem {
var data:String;
var strArray:List_1<String>;
var intArray:List_1<Int>;
var dynArray:List_1<Dynamic>;
var anyArray:List_1<Any>;
}
class Main {
public static function main():Void {
var t:DataItem = JsonConvert.DeserializeObject(
haxe.Json.stringify({data:"d", strArray:[], intArray: [], dynArray: [], anyArray: []})
);
var strArray = t.strArray;
strArray.Add("via reference");
t.strArray.Add("via getField");
trace(strArray.Count); // 2
trace(t.strArray.Count); // 2
var strArray = t.strArray;
strArray.Add("via reference");
t.strArray.Add("via getField");
trace(strArray.Count); // 4
trace(t.strArray.Count); // 4
var intArray = t.intArray;
intArray.Add(0);
t.intArray.Add(1);
trace(intArray.Count); // 2
trace(t.intArray.Count); // 2
var dynArray = t.dynArray;
dynArray.Add(0);
dynArray.Add("test");
trace(dynArray.Count); // 2
trace(t.dynArray.Count); // 2
var anyArray = t.anyArray;
anyArray.Add(0);
anyArray.Add("test");
trace(anyArray.Count); // 2
trace(t.anyArray.Count); // 2
}
}
@Simn Thanks. We are stable-ish. I assume it is a mighty pain in the arse issue since it has been lingering/hidden for 4 years.
@Simn Guessing 4.2 is out?
No.
Any magic moving on this pipe of love?
Quarterly ping! Let me know how I can help if possible
AFAIK nothing new here
Just checking in on this
Keeping the dream alive
At this point I can say that unfortunately this won't be addressed for the C# target.
Thank you @simn I appreciate the message. It seems like a big hole but apparently very difficult.