ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

My intel-cpu server has a total of 80 cpu cores, but the program in jupyter can only run 20 cpu cores, how can I run all the cpu cores?

Open a-strong-python opened this issue 1 year ago • 7 comments

root@s099-n016:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 80 On-line CPU(s) list: 0-79 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 2 Stepping: 4 NUMA:
NUMA node(s): 2 NUMA node0 CPU(s): 0-19,40-59 NUMA node1 CPU(s): 20-39,60-79

a-strong-python avatar Feb 20 '24 19:02 a-strong-python

From your lscpu output, we can get the following information:

  1. Hyper-threading is enabled, which can do harm to performance for computation-intensive task
  2. One socket has 20 physical cores, so currently the best practice is to use 20 cores in one socket.

You can search blogs about numa for more information.

If you do want to use all of the cores, I guess you may want to tune OMP_NUM_THREADS config or use numactl.

export OMP_NUM_THREADS=80
numactl -C 0-79 your_program

If this does not work, can you provide the instructions for reproducing the issue?

gc-fu avatar Feb 21 '24 02:02 gc-fu

Here is the output after I executed the command under jupyter lab:

!numactl --show

policy: default
preferred node: current
physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 
cpubind: 0 
nodebind: 0 
membind: 0 1 

!numactl --hardware

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 0 size: 127598 MB
node 0 free: 124906 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 1 size: 128960 MB
node 1 free: 127766 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 

!export OMP_NUM_THREADS=80 !numactl -C 0-79 ./main.ipynb

libnuma: Warning: cpu argument 0-79 is out of range

<0-79> is invalid
usage: numactl [--all | -a] [--interleave= | -i <nodes>] [--preferred= | -p <node>]
               [--physcpubind= | -C <cpus>] [--cpunodebind= | -N <nodes>]
               [--membind= | -m <nodes>] [--localalloc | -l] command args ...
       numactl [--show | -s]
       numactl [--hardware | -H]
       numactl [--length | -l <length>] [--offset | -o <offset>] [--shmmode | -M <shmmode>]
               [--strict | -t]
               [--shmid | -I <id>] --shm | -S <shmkeyfile>
               [--shmid | -I <id>] --file | -f <tmpfsfile>
               [--huge | -u] [--touch | -T] 
               memory policy | --dump | -d | --dump-nodes | -D

memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l
<nodes> is a comma delimited list of node numbers or A-B ranges or all.
Instead of a number a node can also be:
  netdev:DEV the node connected to network device DEV
  file:PATH  the node the block device of path is connected to
  ip:HOST    the node of the network device host routes through
  block:PATH the node of block device path
  pci:[seg:]bus:dev[:func] The node of a PCI device
<cpus> is a comma delimited list of cpu numbers or A-B ranges or all
all ranges can be inverted with !
all numbers and ranges can be made cpuset-relative with +
the old --cpubind argument is deprecated.
use --cpunodebind or --physcpubind instead
<length> can have g (GB), m (MB) or k (KB) suffixes

a-strong-python avatar Feb 21 '24 05:02 a-strong-python

No matter what I will! numactl -C 0-79./main.ipynb Specifies the number of times an error message is displayed

libnuma: Warning: cpu argument xxx is out of range is invalid

This is very confusing to me!

a-strong-python avatar Feb 21 '24 06:02 a-strong-python

Here is the output after I executed the command under jupyter lab:

!numactl --show

policy: default
preferred node: current
physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 
cpubind: 0 
nodebind: 0 
membind: 0 1 

This numactl output indicates that some of your NUMA nodes aren't populated with any memory. Memory seems all installed for node 0.

This explains why numactl -C 0-79./main.ipynb complain core number out of range.

Please check memory installation on that server.

qiyuangong avatar Feb 22 '24 02:02 qiyuangong

Here is the output after I executed the command under jupyter lab: !numactl --show

policy: default
preferred node: current
physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 
cpubind: 0 
nodebind: 0 
membind: 0 1 

This numactl output indicates that some of your NUMA nodes aren't populated with any memory. Memory seems all installed for node 0.

This explains why numactl -C 0-79./main.ipynb complain core number out of range.

Please check memory installation on that server.

I use intel Developer Cloud for the On the Edge free server resources, Therefore, I can not view the actual physical server memory installation situation, can only through the 'lscpu' and other commands to view, the current known is that the server is equipped with 256G memory, and the cpu 80 cores and 256G memory resources are evenly allocated to 0,1 two nodes. As follows:

!numactl --hardware

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 0 size: 127598 MB
node 0 free: 124906 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 1 size: 128960 MB
node 1 free: 127766 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 

a-strong-python avatar Feb 22 '24 13:02 a-strong-python

If necessary, you can also go to the intel platform to quickly reproduce the problem:Developer Cloud for the On the Edge

a-strong-python avatar Feb 22 '24 13:02 a-strong-python

If necessary, you can also go to the intel platform to quickly reproduce the problem:Developer Cloud for the On the Edge

Oh, you are in jupyter notebook provided by Developer Cloud for the On the Edge.

I tried some commands on a free jupyter notebook from Developer Cloud for the On the Edge. Seems jupyter process is running inside a container or VM.

That means although we can see all cores and memory with lspuc and numactl --hardware, we can only use the resource assigned (i.e., numactl --show or assigned when container/vm created). In your env/processor, it is physcpubind: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59, 20 cores.

Another possible reason is that core binding( i.e., numactl xx) is used when jupyter notebook is launched.

Please contact Developer Cloud supporter for resource related issues.

qiyuangong avatar Feb 23 '24 00:02 qiyuangong