serving icon indicating copy to clipboard operation
serving copied to clipboard

error: 'val.val' may be used uninitialized in this function with 2.2.0-rc2 using bazel build

Open satishpasumarthi opened this issue 5 years ago • 24 comments

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow Serving installed from (source or binary): source
  • TensorFlow Serving version: 2.2.0-rc2

Describe the problem

When building the model-server using bazel build with mkl config, it fails. I used the optimization level O3. When build with any other lower levels O2 or O, the build passes. Also, it seems that this issues is coming from the external library "upb" ?

I see a similar bug reported here (in docker) https://github.com/tensorflow/serving/issues/1596 , but the workaround provided for that case doesn't work here as here bazel build is used

Exact Steps to Reproduce

bazel build -c opt //tensorflow_serving/model_servers:tensorflow_model_server --progress_report_interval 3600 --logging 0 --verbose_failures

Source code / logs

ERROR: /root/.cache/bazel/_bazel_root/a64d7dba00d3b1f5dfea992b48914cc1/external/upb/BUILD:57:1: C++ compilation of rule '@upb//:upb' failed (Exit 1) external/upb/upb/table.c: In function 'upb_inttable_pop': external/upb/upb/table.c:588:10: error: 'val.val' may be used uninitialized in this function [-Werror=maybe-uninitialized] return val; ^~~ cc1: all warnings being treated as errors Target //tensorflow_serving/model_servers:tensorflow_model_server failed to build INFO: Elapsed time: 292.874s, Critical Path: 202.45s INFO: 3666 processes: 3666 local. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully

satishpasumarthi avatar May 26 '20 22:05 satishpasumarthi

@satishpasumarthi This is similar to this issue. Please follow the workaround for now. Thanks!

gowthamkpr avatar May 29 '20 06:05 gowthamkpr

Hi, as mentioned in the bug description the workaround provided in https://github.com/tensorflow/serving/issues/1596 doesn't help with bazel build. I've tried that option but didn't help.

satishpasumarthi avatar May 29 '20 08:05 satishpasumarthi

@gowthamkpr the issue still persists in 2.2 release. I think the issue pops up when the optimization flag "--copt=-O3" is enabled. Can we expect a fix for this?

satishpasumarthi avatar Jun 04 '20 00:06 satishpasumarthi

i think the problem here is building https://github.com/protocolbuffers/upb (external) repo with the described compiler+optimization levels.

@satishpasumarthi it may be faster to resolve this by reporting an issue to https://github.com/protocolbuffers/upb with a sample compiler command line flag, that will cause the build to fail. and once upb fixes it in their repo, we can update our TF sources to pick those changes.

netfs avatar Jun 29 '20 18:06 netfs

Hi @netfs , I've reported the issue on the upb external repo and they have pushed the changes to master. Can the TF team pull the latest changes ? https://github.com/protocolbuffers/upb/issues/300

satishpasumarthi avatar Jun 29 '20 23:06 satishpasumarthi

@mihaimaruseac -- do you have recommendations on how can we get protocolbuffers/upb#300 fixes into TF?

it seems TF serving get upb from TF, and TF gets it from grpc (part of grpc_deps() afaics).

netfs avatar Jun 30 '20 17:06 netfs

The solution then should be to update grpc dependency in TF's workspace.

mihaimaruseac avatar Jun 30 '20 17:06 mihaimaruseac

@netfs @mihaimaruseac Did this fix go in TF 2.3 ?

satishpasumarthi avatar Aug 04 '20 22:08 satishpasumarthi

i do not think TF 2.3 updated their grpc dependency:

https://github.com/tensorflow/tensorflow/blame/88ab72153e13d925f951e9464a121cda47083e3b/tensorflow/workspace.bzl#L687

i think the fastest way to fix this, is to send a PR to tensorflow with the relevant grpc release that you want (the one that includes upb fixes). once TF is updated, TF Serving HEAD will pick it up in few hours. Yes unfortunately, this means this change will go in TF[S] 2.4 (O(month) from now?)

netfs avatar Aug 04 '20 22:08 netfs

Thanks @netfs . Will create a PR

satishpasumarthi avatar Aug 05 '20 01:08 satishpasumarthi

Is there a PR created?

mihaimaruseac avatar Oct 13 '20 16:10 mihaimaruseac

Not yet. I can cut one this week. would that work?

satishpasumarthi avatar Oct 13 '20 16:10 satishpasumarthi

Yes. That's good.

We are on the last mile before 2.4 release, so it's good to have the PR.

Also, if you want the PR cherry-picked on the other branches, in case we do patch releases, please create PRs against the other branches too and mention me to add the needed tags.

mihaimaruseac avatar Oct 13 '20 16:10 mihaimaruseac

sure, will do.

satishpasumarthi avatar Oct 13 '20 16:10 satishpasumarthi

Did you create the PR? We just cut the branch for 2.4 release

mihaimaruseac avatar Oct 24 '20 02:10 mihaimaruseac

Here it is https://github.com/tensorflow/tensorflow/pull/44282

satishpasumarthi avatar Oct 24 '20 06:10 satishpasumarthi

Are there supposed to be any performance regressions if I replace --copt=-O3 and use O2 instead ? I had a regression in performance when using O2 with tensorflow serving 2.3.0(intel mkl 0.21.3) vs tensorflow serving 1.15.0(intel mkl 0.20.3). I compiled both with nativeopt.

Or should I just wait for the next release ?

vigbk avatar Oct 27 '20 09:10 vigbk

I have found another workaround. Per some other reported issue this error started to appear after bazel update, and it looks like new version uses "-Werror=maybe-uninitialized" build option by default. This option actually just transforms warning into error and to discard this effect option "-Wno-error=maybe-uninitialized" can be used.

So by using something like this

./tools/run_in_docker.sh -d tensorflow/serving:"${TAG}"-devel bazel build --config=nativeopt --output_filter="[-]" --copt=-Wno-error=maybe-uninitialized tensorflow_serving/...

I have managed to successfully build with nativeopt all affected versions - 2.2.x, 2.3.x, 2.4.x and everything working as expected.

wargx avatar Feb 01 '21 12:02 wargx

Can you send a PR please?

mihaimaruseac avatar Feb 01 '21 16:02 mihaimaruseac

There are no changes in code required so I am not sure what to PR here. It is just modification of docker build command from official documentation, just this flag needs to be added to run_in_docker.sh parameters. As this is a workaround I don't think it should be pushed to documentation either.

wargx avatar Feb 02 '21 19:02 wargx

@mihaimaruseac is (i think) asking for a PR that updates TF's grpc dependency (see https://github.com/tensorflow/serving/issues/1642#issuecomment-651945515 above)

netfs avatar Feb 02 '21 19:02 netfs

There is a pr tensorflow/tensorflow#52853 that retries updating grpc dependency.

NobodyXu avatar Nov 01 '21 23:11 NobodyXu

Assigning to you @mihaimaruseac since you are reviewing the PR.

lilao avatar Nov 10 '21 17:11 lilao

GRPC update bump is blocked by Windows breakages.

mihaimaruseac avatar Nov 11 '21 15:11 mihaimaruseac

@satishpasumarthi,

https://github.com/tensorflow/tensorflow/commit/84f40925e929d05e72ab9234e53c729224e3af38 updates the grpc dependency. Please let us know if this issue can be closed. Thank you for you contributions.

singhniraj08 avatar Feb 03 '23 09:02 singhniraj08

Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!

singhniraj08 avatar Feb 20 '23 06:02 singhniraj08