fengding

Results 23 comments of fengding

Sorry. Maybe next Tuesday if no critical issue.

Note: neither “windows powershell” or “anaconda powershell prompt” does not work, if using conda prompt, we need to use "anaconda prompt"

Hi @raevillena How can I reproduce your issue ?

How do you "Stop + Exit the container" ? Does the below command line still see the container? $ docker ps -a

> 1. Run the example workflow using the single fp8 checkpoint provided: https://comfyanonymous.github.io/ComfyUI_examples/flux/#flux-dev-1 > 2. Observe that the inference completes > 3. Stop + Exit the container Can you observe...

> assert name not in self.metrics.keys(), "registered metric name already exists." What's the registered metric name? Can you print it? Where is it registered first time?

Thanks for this report. I can reproduce it in my ARC770 platform.

It's OOM from device but not host. It looks ARC770 16G memory is not enough for this model "LanguageBind/Video-LLaVA-7B-hf" / float16

Gaudi docs page: https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html You can set HABANA_VISIBLE_DEVICES=0,1,2,3 , to specify the device ids instead of all.

**Note**: [Gaudi doc](https://docs.habana.ai/en/latest/PyTorch/Reference/PyTorch_Support_Matrix.html#pytorch-support-matri) -> Device Management -> > Sharing 1 device between multiple processes | No | No That means llm_service and tei embedding have to run on different gaudi...