Han Liu
Han Liu
The last outputs are " Downloading builder script: 2.86kB [00:00, 3.19MB/s] Downloading builder script: 2.86kB [00:00, 3.01MB/s] Downloading builder script: 2.86kB [00:00, 3.07MB/s] Downloading builder script: 2.86kB [00:00, 2.45MB/s] Downloading...
### Run train.py using "torchrun --standalone --nproc_per_node=1 train.py --dataset=shakespeare --dtype=float32 --batch_size=8 --compile=True" and got the following error: master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified....
Try on a cluster using multiple nodes. Example: 1) run "torchrun --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr= --master_port=1234 train.py --dataset=shakespeare --dtype=float16 --batch_size=2 --compile=False" Got errors: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to...
Who has the same issue? Downloading and preparing dataset openwebtext/plain_text to C:/Users/liux3790/Desktop/download/cache/openwebtext/plain_text/1.0.0/85b3ae7051d2d72e7c5fdf6dfb462603aaa26e9ed506202bf3a24d261c6c40a1... Downloading data: 100%|█████████████████████████████████████████████████████████████████████████████████| 12.9G/12.9G [46:33
Never got train.bin. Always failed after tiktoken.
Like Langchain, GPT-index, or Chatgpt-retrieve-plugin?
Does cuQuantum include efficient functions for implementing arithmetic operations, i.e., addition and subtraction, on two tensor networks?