Trevor L. McDonell comments

Results 154 comments of


                                            Trevor L. McDonell

Unexpected numerical results with LLVM Native but not with the Interpreter

Uh, that is very strange. I never use `ghc-options` in the stack.yaml files, so not sure if those are doing something unexpected, though I doubt it? That looks like the...

Continuous integration

[![Automation](https://imgs.xkcd.com/comics/automation.png "'Automating' comes from the roots 'auto-' meaning 'self-', and 'mating', meaning 'screwing'.")](https://xkcd.com/1319/)

Continuous integration

Moving to GitHub Actions, so that we can have a self-hosted runner to test and benchmark GPU code as well.

Asynchronous execution entails using _non-default stream(s)_ and _event waiting_ for dependencies. With support for streams and events, we should also (correctly) support asynchronous memory transfer, which additionally requires: - The...

Asynchronous execution

See also: - https://developer.nvidia.com/content/how-optimize-data-transfers-cuda-cc - https://developer.nvidia.com/content/how-overlap-data-transfers-cuda-cc

Asynchronous execution

This is all possible now, just not exposed very nicely yet. See this profiler output, where compute and data transfer overlaps nicely with full-speed DMA to pinned memory: Also note...

PRNGs/low-discrepancy sequences on the device?

Yep, #52 has some notes. Random number generation in `Acc` land would be good though, especially, for the GPU. The `mwc-random-accelerate` package, which does the generation on the host using...

PRNGs/low-discrepancy sequences on the device?

Good discussion on generating random numbers at #412

PRNGs/low-discrepancy sequences on the device?

I have created the [sfc-random-accelerate](https://github.com/tmcdonell/sfc-random-accelerate) package. I have done no analysis to check how good this is when evaluated in parallel, but reading the description from [where I stole the...

Support for "combining" operators

There is now the `(++)` operation that appends arrays, intersecting the inner-dimensional part.