node-red-contrib-tf-model Problems installing tfjs-node on jetson

Because CUDA depends on the wrong version, I tried to recompile version 1.15, but it will report an error, what to do next

jiayq@jiayq-desktop:~/Desktop/tensorflow$ ./configure
WARNING: ignoring LD_PRELOAD in environment.
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.26.1- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: 
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.

Traceback (most recent call last):
  File "./configure.py", line 1602, in <module>
    main()
  File "./configure.py", line 1473, in main
    if validate_cuda_config(environ_cp):
  File "./configure.py", line 1352, in validate_cuda_config
    tuple(line.decode('ascii').rstrip().split(': ')) for line in proc.stdout)
ValueError: dictionary update sequence element #9 has length 1; 2 is required

May 10 '20 10:05 land007

I haven't check the NVIDIA JetPack SDK for a while. Is there a new version which use new CUDA version? The error you shared above looks like it hit an error while validate CUDA in your env. Can you share more information about your env?

May 11 '20 17:05 yhwang

Oh, thanks for the reply. You mean that you did not encounter the above error when compiling in the old version. I can try to compile by downgrading. The current version environment includes the following:

-Libraries: * CUDA: 10.2.89 * cuDNN: 8.0.0.145 * TensorRT: 7.1.0.16

May 12 '20 01:05 land007

nope, I didn't see that error while compiling the shared libs of tensorflow. I think I need to upgrade the JetPack SDK that I use. Unfortunately, I can't get my jetson nano in my office, since I am working remotely now. On my jetson nano, I am still using:

CUDA: 10.0
cuDNN: 7.5
TensorRT: 5.1.6

Here are the diff from my git diff output

diff --cc third_party/aws/BUILD.bazel
index 36f7ca2fd3,27bc03264f..0000000000
--- a/third_party/aws/BUILD.bazel
+++ b/third_party/aws/BUILD.bazel
@@@ -27,10 -24,9 +27,12 @@@ cc_library
          "@org_tensorflow//tensorflow:raspberry_pi_armeabi": glob([
              "aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
          ]),
 +        "@org_tensorflow//tensorflow:freebsd": glob([
 +            "aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
 +        ]),
-         "//conditions:default": [],
+       "//conditions:default": glob([
+             "aws-cpp-sdk-core/source/platform/linux-shared/*.cpp",
+         ]),
      }) + glob([
          "aws-cpp-sdk-core/include/**/*.h",
          "aws-cpp-sdk-core/source/*.cpp",

and

* Unmerged path third_party/aws/BUILD.bazel
diff --git a/third_party/nccl/build_defs.bzl.tpl b/third_party/nccl/build_defs.bzl.tpl
index 5719139855..6389cd94cd 100644
--- a/third_party/nccl/build_defs.bzl.tpl
+++ b/third_party/nccl/build_defs.bzl.tpl
@@ -40,7 +40,7 @@ def _rdc_copts():
     # The global functions can not have a lower register count than the
     # device functions. This is enforced by setting a fixed register count.
     # https://github.com/NVIDIA/nccl/blob/f93fe9bfd94884cec2ba711897222e0df5569a53/makefiles/common.mk#L48
-    maxrregcount = "-maxrregcount=96"
+    maxrregcount = "-maxrregcount=80"
 
     return cuda_default_copts() + select({
         "@local_config_cuda//cuda:using_nvcc": [

May 12 '20 06:05 yhwang

You are so nice. I will try to compile it according to your configuration.

May 14 '20 07:05 land007

Yes, it would be great to have an updated version of libtensorflow (1.15.2 is available). I wasn't able to succeed so far to compile it myself. The trouble already starts with building the bazle executable :-(

Jun 07 '20 14:06 jeffrson

@jeffrson based on the code here: https://github.com/tensorflow/tfjs/blob/master/tfjs-node/scripts/deps-constants.js#L25 tfjs-node is still using tensorflow 1.15.0. tf-model custom node depends on tfjs-node and that's why my instruction only covers tensorflow v1.15.0. I know the latest version of tensorflow is even newer. I definitely can try to build v1.15.2. However, can you explain more about your needs for v1.15.2 ?

Jun 10 '20 00:06 yhwang

Well, I was hoping for bugfixes ;-) My application just a needs a bit too much memory for the Jetson and decodePng brings a strange error message. But maybe 1.15.2 won't help much here.

Jun 15 '20 19:06 jeffrson

node-red-contrib-tf-model node-red-contrib-tf-model copied to clipboard

Problems installing tfjs-node on jetson

node-red-contrib-tf-model
node-red-contrib-tf-model copied to clipboard