pauleonix
pauleonix
I can, but the the only type I have 4 nodes of is knl, so the single cores are very slow (the network is slow too.). When I tried it...
As you can see in the new issue I have found out what stopped the non-hwloc run from working (I thought that it couldn't be that slow/that much to test)....
Thanks. That will do.
Seems like `thrust::uninitialized_copy` isn't capable of inter-system copies at all. While `thrust::copy` does have these capabilities, they aren't documented. Most examples use the vector constructors/assignment operators for these inter-system copies....
Yeah, I didn't think of these fancy references probably because accessing them from the host is rarely the right thing to do. But as the `std::allocator` doesn't give you a...
Also regarding fancy references, they seem to be never used inside algorithms (by design?). I.e. ```c++ dst[0] = 42.; dst[1] = 42.; dst[2] = 42.; ``` works, but ```c++ thrust::fill(thrust::host,...
No, it wont. Exactly because fancy references aren't part of the equation here.
The fact that the `thrust::fill` above compiles but then segfaults is especially interesting as a user might expect this to be a compile time error due to the `device_ptr` wrapper....
Yeah, you are right again... :smile: When forcing an "unoptimized" version, it doesn't segfault: ```c++ thrust::copy_n(thrust::host, thrust::counting_iterator(0.), 3, dst); ```
So then fancy references are actually used inside algorithms (makes a lot of sense), but they only work for the device-to-host direction. And you argument then is that semantically, when...