dataguru

Results 8 issues of dataguru

[root@master01 kubernetes]# kubectl describe pod -n permission-manager Warning FailedCreatePodSandBox 20s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "e3bad963c494ae17bcb007fa366cd47ca6a03026fec11cedc7a2bb17311f334d"...

invalid
question

(deepspeed) [menkeyi@gpu1 DeepSpeed-Chat]$ python train.py --actor-model facebook/opt-13b --reward-model facebook/opt-350m --deployment-type single_node ---=== Running Step 1 ===--- Running: bash /home/menkeyi/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_scripts/single_node/run_13b.sh /home/menkeyi/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/13b GPU usage rate: (deepspeed) [menkeyi@gpu1 DeepSpeed-Chat]$ nvidia-smi Sat Apr 15...

bug
deespeed chat

(deepspeed) [menkeyi@gpu1 DeepSpeed-Chat]$ python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_node (test) [menkeyi@workstation DeepSpeed-Chat]$ tail -f output/actor-models/1.3b/training.log gpu1:1329:1670 [0] NCCL INFO comm 0x43f61e20 rank 0 nranks 8 cudaDev 0 busId...

bug
deepspeed-chat

Machine configuration information (deepspeed) [menkeyi@gpu1 ~]$ df -Th Filesystem Type Size Used Avail Use% Mounted on none overlay 79G 29G 46G 39% / 192.168.100.44@o2ib:/data lustre 98T 4.3T 89T 5% /home...

bug
deepspeed-chat

etc/mky/opencv_ffmpeg_streaming/streamer/streamer.cpp: In member function ‘int streamer::Streamer::init(const streamer::StreamerConfig&)’: /etc/mky/opencv_ffmpeg_streaming/streamer/streamer.cpp:186:37: error: invalid conversion from ‘const AVCodec*’ to ‘AVCodec*’ [-fpermissive] 186 | out_codec = avcodec_find_encoder(codec_id); | ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~ | | | const AVCodec* make[2]:...

[root@K8S-M1 ~]# kubectl get cs NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"} [root@K8S-M1 ~]# kubectl get csr NAME...

[root@ctfdeploy contrail-ansible-deployer]# ansible-playbook -i inventory/ -e orchestrator=openstack -e ansible_sudo_pass=abc@123 playbooks/install_openstack.yml TASK [memcached : Copying over config.json files for services] ************************************************* failed: [10.49.252.201] (item=memcached) => {"changed": false, "item": "memcached", "msg": "AnsibleUndefinedVariable:...

(openpose) root@ai:~/YOLOv3-Torch2TRT# python3 detect.py Traceback (most recent call last): File "detect.py", line 113, in model_backbone.load_darknet_weights(opt.weights_path) File "/root/YOLOv3-Torch2TRT/models.py", line 365, in load_darknet_weights conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight) RuntimeError: shape...