nanoGPT Error when I try nanoGPT on GPUs from Kaggle

Error when I try nanoGPT on GPUs from Kaggle

Open bkowshik opened this issue 1 year ago • 1 comments

Ref: https://www.kaggle.com/code/bkowshik/karpathy-nanogpt/

Trying to get nanoGPT up and running on GPUs provided on Kaggle. Get the following errors:

GPU - P100

$ python [train.py](http://train.py/) config/train_shakespeare_char.py
RuntimeError: Found Tesla P100-PCIE-16GB which is too old to be supported by the triton GPU compiler, which is used as the backend. Triton only supports devices of CUDA Capability >= 7.0, but your device is of CUDA capability 6.0

GPU - T4 x 2

Dependencies

# Version of PyTorch.
>>> import torch
>>> print(torch.__version__)
2.0.0

# Version of CUDA.
!nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Error

[2023-07-06 05:02:42,006] torch._inductor.graph: [INFO] Using FallbackKernel: torch.ops.aten._scaled_dot_product_flash_attention.default
/usr/bin/ld/usr/bin/ld: cannot find -lcuda: No such file or directory
: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status

subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmprh0gq_rd/main.c', '-O3', '-I/usr/local/cuda/include', '-I/opt/conda/include/python3.10', '-I/tmp/tmprh0gq_rd', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmprh0gq_rd/triton_.cpython-310-x86_64-linux-gnu.so']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:

/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status

The full error message is here:

[2023-07-06 08:50:19,371] torch._inductor.graph: [INFO] Using FallbackKernel: torch.ops.aten._scaled_dot_product_flash_attention.default
/usr/bin/ld: cannot find -lcuda: No such file or directory
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 549, in _worker_compile
    kernel.precompile(warm_cache_only_with_cc=cc)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 69, in precompile
    self.launchers = [
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 70, in <listcomp>
    self._precompile_config(c, warm_cache_only_with_cc)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/triton_ops/autotune.py", line 83, in _precompile_config
    triton.compile(
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 1588, in compile
    so_path = make_stub(name, signature, constants)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 1477, in make_stub
    so = _build(name, src_path, tmpdir)
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler.py", line 1392, in _build
    ret = subprocess.check_call(cc_cmd)
  File "/opt/conda/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpma68p_sy/main.c', '-O3', '-I/usr/local/cuda/include', '-I/opt/conda/include/python3.10', '-I/tmp/tmpma68p_sy', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpma68p_sy/triton_.cpython-310-x86_64-linux-gnu.so']' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 670, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.fake_example_inputs())
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/debug_utils.py", line 1055, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1390, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 455, in compile_fx
    return aot_autograd(
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 48, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2805, in aot_module_simplified
    compiled_fn = create_aot_dispatcher_function(
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2498, in create_aot_dispatcher_function
    compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config)
  File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1713, in aot_wrapper_dedupe
    return compiler_fn(flat_fn, leaf_flat_args, aot_config)
  File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1326, in aot_dispatch_base
    compiled_fw = aot_config.fw_compiler(fw_module, flat_args_with_views_handled)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 430, in fw_compiler
    return inner_compile(
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/debug_utils.py", line 595, in debug_wrapper
    compiled_fn = compiler_fn(gm, example_inputs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/debug.py", line 239, in inner
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 177, in compile_fx_inner
    compiled_fn = graph.compile_to_fn()
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/graph.py", line 586, in compile_to_fn
    return self.compile_to_module().call
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/graph.py", line 575, in compile_to_module
    mod = PyCodeCache.load(code)
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 528, in load
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/torchinductor_root/ut/cutluoytkpipzjefaid23t4u36owtuzeuzqqktx6wnp6je4gjbjx.py", line 827, in <module>
    async_compile.wait(globals())
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 715, in wait
    scope[key] = result.result()
  File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 573, in result
    self.future.result()
  File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpma68p_sy/main.c', '-O3', '-I/usr/local/cuda/include', '-I/opt/conda/include/python3.10', '-I/tmp/tmpma68p_sy', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpma68p_sy/triton_.cpython-310-x86_64-linux-gnu.so']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/kaggle/working/nanoGPT/train.py", line 261, in <module>
    losses = estimate_loss()
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/kaggle/working/nanoGPT/train.py", line 221, in estimate_loss
    logits, loss = model(X, Y)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward
    return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 337, in catch_errors
    return callback(frame, cache_size, hooks)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame
    result = inner_convert(frame, cache_size, hooks)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 104, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert
    return _compile(
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
    out_code = transform_code_object(code, transform)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
    transformations(instructions, code_options)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform
    tracer.run()
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
    super().run()
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
    and self.step()
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
    getattr(self, inst.opname)(inst)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1792, in RETURN_VALUE
    self.output.compile_subgraph(
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 541, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: debug_wrapper raised CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpma68p_sy/main.c', '-O3', '-I/usr/local/cuda/include', '-I/opt/conda/include/python3.10', '-I/tmp/tmpma68p_sy', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpma68p_sy/triton_.cpython-310-x86_64-linux-gnu.so']' returned non-zero exit status 1.

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status

Jul 06 '23 05:07 bkowshik

Did you fix it? If not, I think its an error with torch.compile(). Try setting Compile = False and try again. I have a simplified, working implementation of NanoGPT, on Kaggle, based on the original lecture.

Dec 18 '23 09:12 dwijesiri

nanoGPT nanoGPT copied to clipboard

Error when I try nanoGPT on GPUs from Kaggle

GPU - P100

GPU - T4 x 2

nanoGPT
nanoGPT copied to clipboard