petals icon indicating copy to clipboard operation
petals copied to clipboard

Problem running petals on virtual CPU

Open tijszwinkels opened this issue 11 months ago • 3 comments

I ran into this when trying to run: https://github.com/petals-infra/chat.petals.dev

$ flask  run --host=0.0.0.0 --port=5000
Floating point exception (core dumped)

But I believe this is an issue with the petals library itself. The following minimal example crashes as well:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM
import torch

model_name = "enoch/llama-65b-hf"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)

running it:

$ python test.py
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Jul 27 07:53:55.842 [INFO] Make sure you follow the LLaMA's terms of use: https://bit.ly/llama2-license for LLaMA 2, https://bit.ly/llama-license for LLaMA 1
Jul 27 07:53:55.842 [INFO] Using DHT prefix: llama-65b-hf
Floating point exception (core dumped)

It crashes on the last line. Please note it also crashes without the torch_dtype specification.

These are the capabilities of the virtualized CPU I'm running on:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 61
model name      : Intel Core Processor (Broadwell, IBRS)
stepping        : 2
microcode       : 0x1
cpu MHz         : 3408.010
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds mmio_unknown
bogomips        : 6816.02
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

tijszwinkels avatar Jul 27 '23 07:07 tijszwinkels

Hi, thanks for reporting this! Can you try running some PyTorch code that is independent of Petals in your environment? For instance, any example from the transformers library: https://github.com/huggingface/transformers/tree/main/examples/pytorch

mryab avatar Jul 27 '23 08:07 mryab

I opted for the multiple-choice one, runs without issue.

Screenshot 2023-07-27 at 10 41 23

tijszwinkels avatar Jul 27 '23 08:07 tijszwinkels

torch_dtype=torch.float32

I'm facing the same error when running the following code:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)

INITIAL_PEERS = [
    "/ip4/192.168.100.250/tcp/31337/p2p/QmdCqmPMqgxFHqmMbbUxuU8Hm5KwoY9zRj5s5DbiyJoPbK",
]

model = AutoDistributedModelForCausalLM.from_pretrained(model_name, initial_peers=INITIAL_PEERS)

It fails on the last line with the following error:

Aug 03 22:22:04.246 [INFO] Make sure you follow the LLaMA's terms of use: https://bit.ly/llama2-license for LLaMA 2, https://bit.ly/llama-license for LLaMA 1
Aug 03 22:22:04.246 [INFO] Using DHT prefix: Llama-2-7b-chat-hf
Floating point exception (core dumped)

CPU details: lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 44 bits physical, 48 bits virtual CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 64 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Model name: Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz Stepping: 2 CPU MHz: 2394.006 BogoMIPS: 4788.01 Hypervisor vendor: Xen Virtualization type: full L1d cache: 2 MiB L1i cache: 2 MiB L2 cache: 16 MiB L3 cache: 1.9 GiB NUMA node0 CPU(s): 0-63

emuchogu avatar Aug 03 '23 22:08 emuchogu