nimlgen

Results 57 comments of nimlgen

What gpu you have? Can you rebase to master and retry?

8m is also pretty slow, gpuocelot is ~3m on the same tests. I will profile remu, but maybe you @Qazalin know what might be slow in remu?

hmm, you think mt really matters running ci? it runs a worker on each thread, so I think we should try just to optimize single threaded?

I have seen `index` to be slow, I think cuz of hashmap. Can we just switch this to array of registers of fixed size? ``` pub struct VGPR { values:...

Can you run with gdb and share backtrace (py-bt)?

``` args_st = self.args_struct_t.from_address(kernargs) for i in range(len(args)): args_st.__setattr__(f'f{i}', args[I]) ``` is `kernargs` 0 here? Do you have an integrated gpu? You can manage visible gpus with https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#rocr-visible-devices if it...

I am trying to reproduce that one more time as well. Somebody who can reproduce that, is args passed to `P_set` before segfault (ptr & value) and the value `kernargs...

notnaton published a trace where segfault happened at function `P_set`: `P_set (ptr=0x7ffec0a00000, value=, size=) at ./Modules/_ctypes/cfield.c:1462`

Yeah, NV should be a bit better but also OOMs. Wrote a custom allocator to fit it better, but it OOM when creating graphs. Is there any way to remove...

Hmm, need to retest. I recall trying to set `cuMemcpyHtoD_v2` in copyin and it still was an OOM during gpu buffer allocation