serving
serving copied to clipboard
error: 'val.val' may be used uninitialized in this function with 2.2.0-rc2 using bazel build
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- TensorFlow Serving installed from (source or binary): source
- TensorFlow Serving version: 2.2.0-rc2
Describe the problem
When building the model-server using bazel build with mkl config, it fails. I used the optimization level O3. When build with any other lower levels O2 or O, the build passes. Also, it seems that this issues is coming from the external library "upb" ?
I see a similar bug reported here (in docker) https://github.com/tensorflow/serving/issues/1596 , but the workaround provided for that case doesn't work here as here bazel build is used
Exact Steps to Reproduce
bazel build -c opt //tensorflow_serving/model_servers:tensorflow_model_server --progress_report_interval 3600 --logging 0 --verbose_failures
Source code / logs
ERROR: /root/.cache/bazel/_bazel_root/a64d7dba00d3b1f5dfea992b48914cc1/external/upb/BUILD:57:1: C++ compilation of rule '@upb//:upb' failed (Exit 1) external/upb/upb/table.c: In function 'upb_inttable_pop': external/upb/upb/table.c:588:10: error: 'val.val' may be used uninitialized in this function [-Werror=maybe-uninitialized] return val; ^~~ cc1: all warnings being treated as errors Target //tensorflow_serving/model_servers:tensorflow_model_server failed to build INFO: Elapsed time: 292.874s, Critical Path: 202.45s INFO: 3666 processes: 3666 local. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully
@satishpasumarthi This is similar to this issue. Please follow the workaround for now. Thanks!
Hi, as mentioned in the bug description the workaround provided in https://github.com/tensorflow/serving/issues/1596 doesn't help with bazel build. I've tried that option but didn't help.
@gowthamkpr the issue still persists in 2.2 release. I think the issue pops up when the optimization flag "--copt=-O3" is enabled. Can we expect a fix for this?
i think the problem here is building https://github.com/protocolbuffers/upb (external) repo with the described compiler+optimization levels.
@satishpasumarthi it may be faster to resolve this by reporting an issue to https://github.com/protocolbuffers/upb with a sample compiler command line flag, that will cause the build to fail. and once upb fixes it in their repo, we can update our TF sources to pick those changes.
Hi @netfs , I've reported the issue on the upb external repo and they have pushed the changes to master. Can the TF team pull the latest changes ? https://github.com/protocolbuffers/upb/issues/300
@mihaimaruseac -- do you have recommendations on how can we get protocolbuffers/upb#300 fixes into TF?
it seems TF serving get upb from TF, and TF gets it from grpc (part of grpc_deps() afaics).
The solution then should be to update grpc dependency in TF's workspace.
@netfs @mihaimaruseac Did this fix go in TF 2.3 ?
i do not think TF 2.3 updated their grpc dependency:
https://github.com/tensorflow/tensorflow/blame/88ab72153e13d925f951e9464a121cda47083e3b/tensorflow/workspace.bzl#L687
i think the fastest way to fix this, is to send a PR to tensorflow with the relevant grpc release that you want (the one that includes upb fixes). once TF is updated, TF Serving HEAD will pick it up in few hours. Yes unfortunately, this means this change will go in TF[S] 2.4 (O(month) from now?)
Thanks @netfs . Will create a PR
Is there a PR created?
Not yet. I can cut one this week. would that work?
Yes. That's good.
We are on the last mile before 2.4 release, so it's good to have the PR.
Also, if you want the PR cherry-picked on the other branches, in case we do patch releases, please create PRs against the other branches too and mention me to add the needed tags.
sure, will do.
Did you create the PR? We just cut the branch for 2.4 release
Here it is https://github.com/tensorflow/tensorflow/pull/44282
Are there supposed to be any performance regressions if I replace --copt=-O3 and use O2 instead ? I had a regression in performance when using O2 with tensorflow serving 2.3.0(intel mkl 0.21.3) vs tensorflow serving 1.15.0(intel mkl 0.20.3). I compiled both with nativeopt.
Or should I just wait for the next release ?
I have found another workaround. Per some other reported issue this error started to appear after bazel update, and it looks like new version uses "-Werror=maybe-uninitialized" build option by default. This option actually just transforms warning into error and to discard this effect option "-Wno-error=maybe-uninitialized" can be used.
So by using something like this
./tools/run_in_docker.sh -d tensorflow/serving:"${TAG}"-devel bazel build --config=nativeopt --output_filter="[-]" --copt=-Wno-error=maybe-uninitialized tensorflow_serving/...
I have managed to successfully build with nativeopt all affected versions - 2.2.x, 2.3.x, 2.4.x and everything working as expected.
Can you send a PR please?
There are no changes in code required so I am not sure what to PR here. It is just modification of docker build command from official documentation, just this flag needs to be added to run_in_docker.sh parameters. As this is a workaround I don't think it should be pushed to documentation either.
@mihaimaruseac is (i think) asking for a PR that updates TF's grpc dependency (see https://github.com/tensorflow/serving/issues/1642#issuecomment-651945515 above)
There is a pr tensorflow/tensorflow#52853 that retries updating grpc dependency.
Assigning to you @mihaimaruseac since you are reviewing the PR.
GRPC update bump is blocked by Windows breakages.
@satishpasumarthi,
https://github.com/tensorflow/tensorflow/commit/84f40925e929d05e72ab9234e53c729224e3af38 updates the grpc dependency. Please let us know if this issue can be closed. Thank you for you contributions.
Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!