Lawrence Mitchell
Lawrence Mitchell
They're not errors at the moment, so I think pinning would be a backward step (and we'd have to go through the rest of rapids and pin there too...)
> I don't think this will ever be an error because it's perfectly valid code. It's just telling you that it might be slower than you anticipated because of acquiring...
> Hey @harrism , > > Can you direct me on how to debug [this](https://gist.github.com/willtryagain/65d36b9b52ccb491bdba1caae1e3f0d0) error please? > > I got this when I ran > > `cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX....
OK, thanks. I have a CTK 12.3 install here, so maybe I can reproduce.
Hmm, I could not reproduce, I did (I used mamba not conda, but everything else is the same). On ce3af2c46b8b: ``` git clean -fdx # there's no magic behind the...
Hmm, that is identical to mine, so I am somewhat at a loss as to what broke
How much host RAM is available on this system? The CPU log peaks at around 360GiB host RAM usage. Could it be that you're running out of both host and...
> I requested 400GB ram in slurm when submitting this job. That _might_ have been your problem (depending on how slurm manages these allocations). It could be that you got...
Hmm. Nothing obviously looks bad there, but if the GPU (and host) memory usage is always increasing this is _either_ because the RMM pool is so fragmented that you can...
> A quick look at the attached CSV log, sorting by size, all large allocations seem to have matching frees. Looking at them in order, the large allocations (~21GiB) occur...