node-red-contrib-tf-model
node-red-contrib-tf-model copied to clipboard
Problems installing tfjs-node on jetson
Because CUDA depends on the wrong version, I tried to recompile version 1.15, but it will report an error, what to do next
jiayq@jiayq-desktop:~/Desktop/tensorflow$ ./configure
WARNING: ignoring LD_PRELOAD in environment.
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.26.1- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]:
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.
Traceback (most recent call last):
File "./configure.py", line 1602, in <module>
main()
File "./configure.py", line 1473, in main
if validate_cuda_config(environ_cp):
File "./configure.py", line 1352, in validate_cuda_config
tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
ValueError: dictionary update sequence element #9 has length 1; 2 is required
I haven't check the NVIDIA JetPack SDK for a while. Is there a new version which use new CUDA version? The error you shared above looks like it hit an error while validate CUDA in your env. Can you share more information about your env?
Oh, thanks for the reply. You mean that you did not encounter the above error when compiling in the old version. I can try to compile by downgrading. The current version environment includes the following:
-Libraries: * CUDA: 10.2.89 * cuDNN: 8.0.0.145 * TensorRT: 7.1.0.16
nope, I didn't see that error while compiling the shared libs of tensorflow. I think I need to upgrade the JetPack SDK that I use. Unfortunately, I can't get my jetson nano in my office, since I am working remotely now. On my jetson nano, I am still using:
- CUDA: 10.0
- cuDNN: 7.5
- TensorRT: 5.1.6
Here are the diff from my git diff
output
diff --cc third_party/aws/BUILD.bazel
index 36f7ca2fd3,27bc03264f..0000000000
--- a/third_party/aws/BUILD.bazel
+++ b/third_party/aws/BUILD.bazel
@@@ -27,10 -24,9 +27,12 @@@ cc_library
"@org_tensorflow//tensorflow:raspberry_pi_armeabi": glob([
"aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
]),
+ "@org_tensorflow//tensorflow:freebsd": glob([
+ "aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
+ ]),
- "//conditions:default": [],
+ "//conditions:default": glob([
+ "aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
+ ]),
}) + glob([
"aws-cpp-sdk-core/include/**/*.h",
"aws-cpp-sdk-core/source/*.cpp",
and
* Unmerged path third_party/aws/BUILD.bazel
diff --git a/third_party/nccl/build_defs.bzl.tpl b/third_party/nccl/build_defs.bzl.tpl
index 5719139855..6389cd94cd 100644
--- a/third_party/nccl/build_defs.bzl.tpl
+++ b/third_party/nccl/build_defs.bzl.tpl
@@ -40,7 +40,7 @@ def _rdc_copts():
# The global functions can not have a lower register count than the
# device functions. This is enforced by setting a fixed register count.
# https://github.com/NVIDIA/nccl/blob/f93fe9bfd94884cec2ba711897222e0df5569a53/makefiles/common.mk#L48
- maxrregcount = "-maxrregcount=96"
+ maxrregcount = "-maxrregcount=80"
return cuda_default_copts() + select({
"@local_config_cuda//cuda:using_nvcc": [
You are so nice. I will try to compile it according to your configuration.
Yes, it would be great to have an updated version of libtensorflow (1.15.2 is available). I wasn't able to succeed so far to compile it myself. The trouble already starts with building the bazle executable :-(
@jeffrson based on the code here: https://github.com/tensorflow/tfjs/blob/master/tfjs-node/scripts/deps-constants.js#L25
tfjs-node
is still using tensorflow 1.15.0. tf-model
custom node depends on tfjs-node and that's why my instruction only covers tensorflow v1.15.0. I know the latest version of tensorflow is even newer. I definitely can try to build v1.15.2. However, can you explain more about your needs for v1.15.2 ?
Well, I was hoping for bugfixes ;-) My application just a needs a bit too much memory for the Jetson and decodePng brings a strange error message. But maybe 1.15.2 won't help much here.