guyueh1
guyueh1
I have the same issue when trying to use faiss on an arm linux platform and numpy==1.26.0. I think the problematic import `import numpy.distutils.cpuinfo` only happend with this platform (aarch64,...
@yfw we need to update the torch version in `tools/build-custom-vllm.sh` as well
@butsugiri Hi, sorry for the delay, I built the container based on the dockerfile you provided and ran the test again, I can still reproduce our results in the blog....
@joyang-nv can you confirm if this is a current limitation of dtensor?
@ZhiyuLi-Nvidia I think this error is with cpu memory leak, not gpu memory. the memory leak seems to happen very ~300 steps repeatedly, it is hard to debug with limited...
@zpqiu sorry for the long delay, I have put some comments; could you first merge in main and then address them? I think this solution can be further optimized in...
@zpqiu can you fix the functional test failure? Also I think the L1 functionality is ran on Ampere GPUs, maybe you need to conditionally skip for cuda arch before sm_90
> > @zpqiu can you fix the functional test failure? > > Also I think the L1 functionality is ran on Ampere GPUs, maybe you need to conditionally skip for...
@terrykong please review
@terrykong this is the last FP8 functionality we want to merge before v0.5, after this I want to perform a refactor of code to make it cleaner and more structured....