cutde icon indicating copy to clipboard operation
cutde copied to clipboard

Python CPU and GPU accelerated TDEs, over 100 million TDEs per second!

Results 10 cutde issues
Sort by recently updated
recently updated
newest added

Hi again Ben, seems like doing source inversion you find the instabilities in the code as the sampler finds the singularities haha. Comparing to matlab I pinned down the issue...

This is rather a question than an issue. I have multiple GPUs at hand and I would want to run multiple processes each on a different GPU. For now always...

geometry.strain_to_stress conversion does not work correctly if the all pairs matrix is being used as input with dimensions higher than 2! Transposing the matrix to have the strain components as...

In the sunny light of the future, it's no longer the right choice to write raw CUDA for a library like this. This whole system could be replaced with 1/10th...

It's not a perfect fit because the generated readme needs to be in the git repo but I don't want to be committing from the CI server. One solution is...

Currently, the screen will freeze if a user runs a large cutde calculation on the GPU that also drives their monitors. This is avoided by chunking the calculation in `disp_blocks`...

In the near-field, an analytic solution is necessary. But for far-field interactions, I suspect a simple Gaussian quadrature of the underlying point Green's function would be much much faster without...

See here: https://tbenthompson.com/book/tdes/hmatrix.html#faster-approximate-blocks-on-gpus-with-cutde ``` approx_max_rank = np.max([U.shape[1] for U, _ in approx_blocks_gpu]) UVflattened = [np.concatenate((V.flatten(), U.flatten())) for U, V in approx_blocks_gpu] approx_block_starts = np.empty(len(UVflattened) + 1, dtype=np.int64) approx_block_starts[0] = 0...

Iterating over the blocks from Python is quite inefficient. See here: https://tbenthompson.com/book/tdes/hmatrix.html#a-matrix-vector-product

Most of the OpenCL kernels seem to take about 1-3 seconds to compile with `pocl`. But, `aca.cu` takes almost three minutes. The is true on both my machine and the...