grok-1
grok-1 copied to clipboard
killed by os when running mac m3max and 128G Mem
dan@MacBook-Pro grok-1 % python3.11 run.py INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: dlopen(libtpu.so, 0x0001): tried: 'libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibtpu.so' (no such file), '/opt/homebrew/lib/libtpu.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache), 'libtpu.so' (no such file), '/usr/local/lib/libtpu.so' (no such file), '/usr/lib/libtpu.so' (no such file, not in dyld cache) INFO:rank:Initializing mesh for self.local_mesh_config=(1, 1) self.between_hosts_config=(1, 1)... INFO:rank:Detected 1 devices in mesh INFO:rank:partition rules: <bound method LanguageModelConfig.partition_rules of LanguageModelConfig(model=TransformerConfig(emb_size=6144, key_size=128, num_q_heads=48, num_kv_heads=8, num_layers=64, vocab_size=131072, widening_factor=8, attn_output_multiplier=0.08838834764831845, name=None, num_experts=8, capacity_factor=1.0, num_selected_experts=2, init_scale=1.0, shard_activations=True, data_axis='data', model_axis='model'), vocab_size=131072, pad_token=0, eos_token=2, sequence_len=8192, model_size=6144, embedding_init_scale=1.0, embedding_multiplier_scale=78.38367176906169, output_multiplier_scale=0.5773502691896257, name=None, fprop_dtype=<class 'jax.numpy.bfloat16'>, model_type=None, init_scale_override=None, shard_embeddings=True)> INFO:rank:(1, 256, 6144) INFO:rank:(1, 256, 131072) INFO:rank:State sharding type: <class 'model.TrainingState'> INFO:rank:(1, 256, 6144) INFO:rank:(1, 256, 131072) INFO:rank:Loading checkpoint at ./checkpoints/ckpt-0 zsh: killed python3.11 run.py