build icon indicating copy to clipboard operation
build copied to clipboard

Internal build vs OSS misaligment

Open bhack opened this issue 2 years ago • 7 comments

We had some c++ flags disaligment between c++ and copybara. See: https://github.com/tensorflow/tensorflow/pull/56276#issuecomment-1164500275

We had also many problem about failure reproducibility between OSS and internal test without forcing OSS with this to reproduce internal failures: https://github.com/tensorflow/tensorflow/pull/56276/commits/11dc383374ea0c34ea60033160d7de5f57aa5116

Also we had multiple rollbacks after merge.

/cc @cheshire any other note?

/cc @learning-to-play

bhack avatar Jul 12 '22 22:07 bhack

P.s. NIT we Need to enforce: https://github.com/tensorflow/tensorflow/pull/56276#discussion_r915074019

P.s.s. the linking required many CPU cores at every iteration cause we had >50 targets to link (very resource intensive) for just running a single test after an edit. So we needed to do the linking in parallel with many core and with a speedup of LD Gold (https://github.com/tensorflow/build/issues/110).

Also I needed to keep the PR branch freezed without rebasing/merging for months cause a rebase will eventually invalidate the Bazel cache requiring a monster build time again. This is another problem for PR that are open for weeks or months and you could need to rebase to solve conflicts.

bhack avatar Jul 12 '22 22:07 bhack

We had also a quite tricky issue filtering single test on the development cycle.

You was already notifed at: https://github.com/tensorflow/tensorflow/commit/0f9af911152880e36b2bfa8435e3746b9e329660

bhack avatar Jul 12 '22 22:07 bhack

@mihaimaruseac I don't know if you could add something here for https://github.com/tensorflow/tensorflow/pull/57468#issuecomment-1231803670

Thanks

bhack avatar Aug 31 '22 08:08 bhack

I believe https://github.com/tensorflow/tensorflow/commit/96b26a21e8d437745444a9f4b6a8d8e39689d083 is a huge step in fixing this, fixing most (if not all) such issues.

cheshire avatar Aug 31 '22 09:08 cheshire

@cheshire Thanks, do you have a full list of the currently enabled warnings as error in copybara related jobs?

bhack avatar Aug 31 '22 09:08 bhack

I believe https://github.com/tensorflow/tensorflow/commit/96b26a21e8d437745444a9f4b6a8d8e39689d083 is a huge step in fixing this, fixing most (if not all) such issues.

We are still suppressing all the warnings: https://github.com/tensorflow/tensorflow/blob/master/.bazelrc#L295-L301

# Suppress all C++ compiler warnings, otherwise build logs become 10s of MBs.
build:android --copt=-w
build:ios --copt=-w
build:linux --host_copt=-w
build:macos --copt=-w
build:windows --copt=/W0
build:windows --host_copt=/W0

With the mentioned commit we are just enabling the specific unused-result but if we check this table that flag is one of the default error we have also with -Werror.

So the point here is to understand what flags we have in the copybara builds as if we are using -Werror there it has > 50 warning types.

bhack avatar Aug 31 '22 13:08 bhack

@mihaimaruseac I don't know if you could add something here for tensorflow/tensorflow#57468 (comment)

Sadly, I don't think we can do much here. The issue there was that an internal file with internal code also needed to be updated. Using copybara instead of the internal file would have been too cumbersome.

Though, I also think that this type of breakages is small. It should only occur when you are adding new defines.

mihaimaruseac avatar Aug 31 '22 15:08 mihaimaruseac