MinkowskiEngine
MinkowskiEngine copied to clipboard
TensorField.sparse is not deterministic when data is on gpu
Describe the bug
When I use ME.TensorField to create input to do segmentation, I found the TensorField.sparse() result in randomness with same input on gpu. (but cpu seems ok).
To Reproduce
import torch
import numpy as np
import MinkowskiEngine as ME
def set_seed(seed):
import torch
import numpy as np
import random
import os
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(seed)
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
def compute_json_md5(json_obj):
import json
import hashlib
json_str = json.dumps(json_obj)
md5 = hashlib.md5(json_str.encode()).hexdigest()
return md5
def create_tensor_filed_then_sparse(feat, coord, device):
"""
Create a ME.TensorField from feat + coord, then call sparse() function on specific device.
And compute input data and output data md5 value
"""
# device = torch.device("cuda")
# device = torch.device("cpu")
a = torch.from_numpy(feat).to(device)
b = torch.from_numpy(coord).to(device)
input_data = {
"f": a.cpu().numpy().tolist(),
"c": b.cpu().numpy().tolist()
}
print("input data md5: ", compute_json_md5(input_data))
in_field = ME.TensorField(
features=a,
coordinates=b,
quantization_mode=ME.SparseTensorQuantizationMode.UNWEIGHTED_AVERAGE,
minkowski_algorithm=ME.MinkowskiAlgorithm.SPEED_OPTIMIZED,
device=device,
)
sinput = in_field.sparse()
sinput_data = {
"f": sinput.features.detach().cpu().numpy().tolist(),
"c": sinput.coordinates.detach().cpu().numpy().tolist()
}
print("sinput md5: ", compute_json_md5(sinput_data))
def compare():
feat = np.load("f.npy")
coord = np.load("c.npy")
set_seed(123)
print("## device(cpu) ..")
device = torch.device("cpu")
print("run 1st ..")
create_tensor_filed_then_sparse(feat, coord, device)
print("run 2nd ..")
create_tensor_filed_then_sparse(feat, coord, device)
print("\n## device(cuda) ..")
device = torch.device("cuda")
print("run 1st ..")
create_tensor_filed_then_sparse(feat, coord, device)
print("run 2nd ..")
create_tensor_filed_then_sparse(feat, coord, device)
if __name__ == "__main__":
compare()
Expected behavior Actual output:
## device(cpu) ..
run 1st ..
input data md5: 1ff467de68cd7f6c81279fcc338a3cd3
sinput md5: 6f7cb28ee0df4ceda98fd87e25fd828d
run 2nd ..
input data md5: 1ff467de68cd7f6c81279fcc338a3cd3
sinput md5: 6f7cb28ee0df4ceda98fd87e25fd828d
## device(cuda) ..
run 1st ..
input data md5: 1ff467de68cd7f6c81279fcc338a3cd3
sinput md5: a1637e0a876d3f6df738205f91205193
run 2nd ..
input data md5: 1ff467de68cd7f6c81279fcc338a3cd3
sinput md5: 4d6afe5a49335d174adaaacb5e95129b
Expect output: Randomness should not happen when sparse on cuda.
Desktop (please complete the following information):
- OS: [e.g. Ubuntu 20.04]
- Python version: [e.g. 3.6.13]
- Pytorch version: [e.g. 1.8.2]
- CUDA version: [e.g. 11.1.74]
- NVIDIA Driver version: [e.g. 510.108.03]
- Minkowski Engine version [e.g. 0.5.4]
- Output of the following command. (If you installed the latest MinkowskiEngine, paste the output of
python -c "import MinkowskiEngine as ME; ME.print_diagnostics()"
. Otherwise, paste the output of the following command.)
wget -q https://raw.githubusercontent.com/NVIDIA/MinkowskiEngine/master/MinkowskiEngine/diagnostics.py ; python diagnostics.py
==========System==========
Linux-5.15.0-56-generic-x86_64-with-debian-bullseye-sid
DISTRIB_ID=Kylin
DISTRIB_RELEASE=V10
DISTRIB_CODENAME=kylin
DISTRIB_DESCRIPTION="Kylin V10 SP1"
DISTRIB_KYLIN_RELEASE=V10
DISTRIB_VERSION_TYPE=enterprise
DISTRIB_VERSION_MODE=normal
3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59)
[GCC 7.5.0]
==========Pytorch==========
1.8.2
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 510.108.03
CUDA Version 11.6
VBIOS Version 94.02.85.00.70
Image Version G001.0000.03.03
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda-11.3/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11030
CUDART version MinkowskiEngine is compiled: 11030
Additional context It may need to run more times to see the md5 difference when execution on gpu.
@chrischoy Can you check this problem, thanks.