Umar Arshad

Results 125 comments of Umar Arshad

Yes, This is related to streaming data from a source on the host to the device efficiently. This is different from CUDA streams although they will be used here.

We have implemented events which should improve support for streams. The main problem is the memory manager and its current implementation and how the streams are associated with each device....

You can try modifying this variable to OFF and see how far you get. https://github.com/arrayfire/arrayfire/blob/d18deaf01042afc3d3222b7786c30ebd7cc8b9a3/CMakeModules/InternalUtils.cmake#L67 I haven't tested static builds so I am not sure what kind of result you...

You can include the ArrayFire so/dlls with your application. They do not need to separately install the library.

As @FloopCZ mentioned, the DLL model works on Windows just fine. With that said, I can understand the appeal for static libraries. Currently, there are some practical limitations that prevent...

We have made a little bit of progress on this front with #2785 . I will attempt to do this again in the next minor release.

This would be faster on afcuda. I haven't looked into afcpu's implementation. This will require you to modify the CUDA kernel implementation.

I don't think there will be a big improvement on the CPU backend.

The join function seems to be calling the join kernel for each buffer that needs to be joined. It would be better if we performed this operation in one kernel...