Trevor L. McDonell
Trevor L. McDonell
Uh, that is very strange. I never use `ghc-options` in the stack.yaml files, so not sure if those are doing something unexpected, though I doubt it? That looks like the...
[](https://xkcd.com/1319/)
Moving to GitHub Actions, so that we can have a self-hosted runner to test and benchmark GPU code as well.
Asynchronous execution entails using _non-default stream(s)_ and _event waiting_ for dependencies. With support for streams and events, we should also (correctly) support asynchronous memory transfer, which additionally requires: - The...
See also: - https://developer.nvidia.com/content/how-optimize-data-transfers-cuda-cc - https://developer.nvidia.com/content/how-overlap-data-transfers-cuda-cc
This is all possible now, just not exposed very nicely yet. See this profiler output, where compute and data transfer overlaps nicely with full-speed DMA to pinned memory: Also note...
Yep, #52 has some notes. Random number generation in `Acc` land would be good though, especially, for the GPU. The `mwc-random-accelerate` package, which does the generation on the host using...
Good discussion on generating random numbers at #412
I have created the [sfc-random-accelerate](https://github.com/tmcdonell/sfc-random-accelerate) package. I have done no analysis to check how good this is when evaluated in parallel, but reading the description from [where I stole the...
There is now the `(++)` operation that appends arrays, intersecting the inner-dimensional part.