RFdiffusion icon indicating copy to clipboard operation
RFdiffusion copied to clipboard

[WSL2] nvrtc compilation failed

Open zmactep opened this issue 2 years ago • 5 comments
trafficstars

Trying to use the tool in WSL2 with my RTX4090. Windows version doesn't work (see the issue #13 ).

Everything loads fine, but then I see an error:

(SE3nv) pavel@Gigabyte-PC:~/RFdiffusion$ python scripts/run_inference.py inference.model_directory_path=/mnt/d/Models/RFdiffusion 'contigmap.contigs=[150-150]' inference.output_prefix=test_outputs/test inference.num_designs=10
Reading models from /mnt/d/Models/RFdiffusion
[2023-04-06 15:59:06,421][rfdiffusion.inference.model_runners][INFO] - Reading checkpoint from /mnt/d/Models/RFdiffusion/Base_ckpt.pt
This is inf_conf.ckpt_path
/mnt/d/Models/RFdiffusion/Base_ckpt.pt
Assembling -model, -diffuser and -preprocess configs from checkpoint
USING MODEL CONFIG: self._conf[model][n_extra_block] = 4
USING MODEL CONFIG: self._conf[model][n_main_block] = 32
USING MODEL CONFIG: self._conf[model][n_ref_block] = 4
USING MODEL CONFIG: self._conf[model][d_msa] = 256
USING MODEL CONFIG: self._conf[model][d_msa_full] = 64
USING MODEL CONFIG: self._conf[model][d_pair] = 128
USING MODEL CONFIG: self._conf[model][d_templ] = 64
USING MODEL CONFIG: self._conf[model][n_head_msa] = 8
USING MODEL CONFIG: self._conf[model][n_head_pair] = 4
USING MODEL CONFIG: self._conf[model][n_head_templ] = 4
USING MODEL CONFIG: self._conf[model][d_hidden] = 32
USING MODEL CONFIG: self._conf[model][d_hidden_templ] = 32
USING MODEL CONFIG: self._conf[model][p_drop] = 0.15
USING MODEL CONFIG: self._conf[model][SE3_param_full] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 8, 'l0_out_features': 8, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 32}
USING MODEL CONFIG: self._conf[model][SE3_param_topk] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 64, 'l0_out_features': 64, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 64}
USING MODEL CONFIG: self._conf[model][d_time_emb] = 0
USING MODEL CONFIG: self._conf[model][d_time_emb_proj] = 10
USING MODEL CONFIG: self._conf[model][freeze_track_motif] = False
USING MODEL CONFIG: self._conf[model][use_motif_timestep] = True
USING MODEL CONFIG: self._conf[diffuser][T] = 50
USING MODEL CONFIG: self._conf[diffuser][b_0] = 0.01
USING MODEL CONFIG: self._conf[diffuser][b_T] = 0.07
USING MODEL CONFIG: self._conf[diffuser][schedule_type] = linear
USING MODEL CONFIG: self._conf[diffuser][so3_type] = igso3
USING MODEL CONFIG: self._conf[diffuser][crd_scale] = 0.25
USING MODEL CONFIG: self._conf[diffuser][so3_schedule_type] = linear
USING MODEL CONFIG: self._conf[diffuser][min_b] = 1.5
USING MODEL CONFIG: self._conf[diffuser][max_b] = 2.5
USING MODEL CONFIG: self._conf[diffuser][min_sigma] = 0.02
USING MODEL CONFIG: self._conf[diffuser][max_sigma] = 1.5
USING MODEL CONFIG: self._conf[preprocess][sidechain_input] = False
USING MODEL CONFIG: self._conf[preprocess][motif_sidechain_input] = True
USING MODEL CONFIG: self._conf[preprocess][d_t1d] = 22
USING MODEL CONFIG: self._conf[preprocess][d_t2d] = 44
USING MODEL CONFIG: self._conf[preprocess][prob_self_cond] = 0.5
USING MODEL CONFIG: self._conf[preprocess][str_self_cond] = True
USING MODEL CONFIG: self._conf[preprocess][predict_previous] = False
[2023-04-06 15:59:10,778][rfdiffusion.inference.model_runners][INFO] - Loading checkpoint.
[2023-04-06 15:59:13,459][rfdiffusion.diffusion][INFO] - Calculating IGSO3.
Successful diffuser __init__
[2023-04-06 15:59:17,256][__main__][INFO] - Making design test_outputs/test_0
[2023-04-06 15:59:17,260][rfdiffusion.inference.model_runners][INFO] - Using contig: ['150-150']
With this beta schedule (linear schedule, beta_0 = 0.04, beta_T = 0.28), alpha_bar_T = 0.00013696048699785024
[2023-04-06 15:59:17,271][rfdiffusion.inference.model_runners][INFO] - Sequence init: ------------------------------------------------------------------------------------------------------------------------------------------------------
Error executing job with overrides: ['inference.model_directory_path=/mnt/d/Models/RFdiffusion', 'contigmap.contigs=[150-150]', 'inference.output_prefix=test_outputs/test', 'inference.num_designs=10']
Traceback (most recent call last):
  File "/home/pavel/RFdiffusion/scripts/run_inference.py", line 85, in main
    px0, x_t, seq_t, plddt = sampler.sample_step(
  File "/home/pavel/RFdiffusion/rfdiffusion/inference/model_runners.py", line 665, in sample_step
    msa_prev, pair_prev, px0, state_prev, alpha, logits, plddt = self.model(msa_masked,
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavel/RFdiffusion/rfdiffusion/RoseTTAFoldModel.py", line 114, in forward
    msa, pair, R, T, alpha_s, state = self.simulator(seq, msa_latent, msa_full, pair, xyz[:,:,:3],
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavel/RFdiffusion/rfdiffusion/Track_module.py", line 420, in forward
    msa_full, pair, R_in, T_in, state, alpha = self.extra_block[i_m](msa_full,
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavel/RFdiffusion/rfdiffusion/Track_module.py", line 332, in forward
    R, T, state, alpha = self.str2str(msa, pair, R_in, T_in, xyz, state, idx, motif_mask=motif_mask, top_k=0)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 141, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/pavel/RFdiffusion/rfdiffusion/Track_module.py", line 266, in forward
    shift = self.se3(G, node.reshape(B*L, -1, 1), l1_feats, edge_feats)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavel/RFdiffusion/rfdiffusion/SE3_network.py", line 83, in forward
    return self.se3(G, node_features, edge_features)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/transformer.py", line 140, in forward
    basis = basis or get_basis(graph.edata['rel_pos'], max_degree=self.max_degree, compute_gradients=False,
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/basis.py", line 167, in get_basis
    spherical_harmonics = get_spherical_harmonics(relative_pos, max_degree)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/basis.py", line 58, in get_spherical_harmonics
    sh = o3.spherical_harmonics(all_degrees, relative_pos, normalize=True)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/e3nn/o3/_spherical_harmonics.py", line 180, in spherical_harmonics
    return sh(x)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavel/.local/share/miniconda3/envs/SE3nv/lib/python3.9/site-packages/e3nn/o3/_spherical_harmonics.py", line 82, in forward
    sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2])
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)


template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_pow_pow_pow_su_9196483836509741110(float* tz_1, float* ty_1, float* tx_1, float* aten_mul, float* aten_mul_1, float* aten_mul_2, float* aten_sub, float* aten_add, float* aten_mul_3, float* aten_pow) {
{
  if (512 * blockIdx.x + threadIdx.x<22350 ? 1 : 0) {
    float ty_1_1 = __ldg(ty_1 + 3 * (512 * blockIdx.x + threadIdx.x));
    aten_pow[512 * blockIdx.x + threadIdx.x] = ty_1_1 * ty_1_1;
    float tz_1_1 = __ldg(tz_1 + 3 * (512 * blockIdx.x + threadIdx.x));
    float tx_1_1 = __ldg(tx_1 + 3 * (512 * blockIdx.x + threadIdx.x));
    aten_mul_3[512 * blockIdx.x + threadIdx.x] = (float)((double)(tz_1_1 * tz_1_1 - tx_1_1 * tx_1_1) * 0.8660254037844386);
    aten_add[512 * blockIdx.x + threadIdx.x] = tx_1_1 * tx_1_1 + tz_1_1 * tz_1_1;
    aten_sub[512 * blockIdx.x + threadIdx.x] = ty_1_1 * ty_1_1 - (float)((double)(tx_1_1 * tx_1_1 + tz_1_1 * tz_1_1) * 0.5);
    aten_mul_2[512 * blockIdx.x + threadIdx.x] = (float)((double)(ty_1_1) * 1.732050807568877) * tz_1_1;
    aten_mul_1[512 * blockIdx.x + threadIdx.x] = (float)((double)(tx_1_1) * 1.732050807568877) * ty_1_1;
    aten_mul[512 * blockIdx.x + threadIdx.x] = (float)((double)(tx_1_1) * 1.732050807568877) * tz_1_1;
  }
}
}


Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

zmactep avatar Apr 06 '23 13:04 zmactep

same problem, did you fix it?

xueluofengfei avatar May 07 '23 04:05 xueluofengfei

fixed this problem by updating pytorch with cudatoolkit to 11.8

xueluofengfei avatar May 07 '23 05:05 xueluofengfei

Thank! I've fixed the package for Windows environment.

zmactep avatar May 13 '23 19:05 zmactep

fixed this problem by updating pytorch with cudatoolkit to 11.8

Hey, I just met the same problem. I wonder did you mean that I should find a pytorch version matches cudatoolkit 11.8 and update the pytorch to this version? I found that pytorch 2.0.0 can match cuda 11.8, should I upload in this way # CUDA 11.8 conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia

Gibo-Chun avatar Aug 16 '23 05:08 Gibo-Chun

Thanks! I uninstalled the cudatoolkit conda uninstall cudatoolkit cudnn and cudnn, and installed cudatoolkit==11.8 and cudnn conda install cudatoolkit cudnn. And finally run conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia. And this problem can be sovled.

Gibo-Chun avatar Aug 16 '23 08:08 Gibo-Chun