go icon indicating copy to clipboard operation
go copied to clipboard

x/build: add LUCI linux-mipsle builder

Open dmitshur opened this issue 1 year ago • 8 comments

There currently isn't a LUCI builder that tests the linux/mipsle port (other than the misc-compile builder, which tests only that the port compiles). This is the tracking issue for it.

The next steps that a builder owner will need to follow to make progress here are documented https://go.dev/wiki/DashboardBuilders#luci-builders.

CC @golang/mips.

Also CC @nrakovic.

dmitshur avatar May 10 '24 20:05 dmitshur

We were lately in contact with @FiloSottile and he pointed us to this issue and the risk of the MIPS Go port might get dropped due to missing LUCI builders. Our company heavily relies on the MIPS Go port and we would like to provide the infrastructure for several of such builders.

This is the first time we are looking into providing a LUCI builder so I have a couple of questions.

  • MIPS is now almost exlucisively used in embedded devices and routers. Therefore it's hard to get any decent powerful hardware for CI stuff. Should those runners be bare metal, or are e.g. QEMU emulated runners fine?
  • We could provide several MT7688A based nodes (this is our current hardware in use), but specs are limited.
    • MIPS24KEc single core 580MHz
    • 128MB Memory
  • We would have the possibilty to get our hands on several MT7621 nodes with the following specs:
    • MIPS1004Kc dual core 880MHz
    • 256MB Memory

To get a feeling of how our current hardware would perform, is there some documentation what would be run on those runners?

  • Will they build the whole Go tree?
  • Are the test suites cross compiled to MIPS on other runners and only the test suites run?
  • Are there some docs, where wou could manually run the tasks to get a feeling of how long the build times would be?

stffabi avatar Nov 29 '24 08:11 stffabi

Thank you for offering to provide the builders, and apologies for the delay in getting back to these questions.

Therefore it's hard to get any decent powerful hardware for CI stuff. Should those runners be bare metal, or are e.g. QEMU emulated runners fine?

The ideal hardware for a builder to use is something that most closely resembles the target port being tested, such that it's findings are most likely to be true positives with minimal false negatives. QEMU emulation adds an additional layer of indirection, and may introduce some false positives. But if there isn't too much noise, it's entirely possible that the benefit from the speed up you're able to get that way outweighs the downsides of adding that layer. Ultimately, it's up to the you as the builder owner to decide what builder setup to aim for, based on how well it'll serve and how well you can keep it running.

Note that the builder requirements written down at https://go.dev/wiki/DashboardBuilders#builder-requirements list 512 MB as the minimum, so with only 128 or 256 MB of memory, you'd be below the supported range and may run into trouble.

  • Will they build the whole Go tree?
  • Are the test suites cross compiled to MIPS on other runners and only the test suites run?
  • Are there some docs, where wou could manually run the tasks to get a feeling of how long the build times would be?

As I answer these questions below, please keep in mind that what I'm describing is the default builder setup for ports that have high performance hardware available. It is possible to reduce the scope of testing for certain ports/builders if there's not sufficient builder resources available, but that reduces coverage of testing. See some related discussion in https://github.com/golang/go/issues/67105#issuecomment-2266181870.

By default, builders build and test the main Go repository and all golang.org/x repositories on the main branch ("master") and all supported release branches (e.g., "release-branch.go1.22" and "release-branch.go1.23").

In the default case, builders testing a given port are responsible both for building and testing. This increases coverage (proof that the target OS/arch can be used to build successfully) and makes it easier to have cgo enabled. In special cases, like the Plan 9 builders that can't run the LUCI swarming client due to absence of the python3 dependency, it's possible to arrange for a Linux host to do the building and perform test execution remotely. See https://github.com/golang/go/issues/62025#issuecomment-2116456142 for details.

For the main Go repository, the builders run the equivalent of all.bash. For golang.org/x repositories, they run the equivalent of go test -short ./... for all modules. You can run those commands locally to get a good idea of the performance to expect.

I hope that helps. When you're ready to proceed, see step 2 of https://go.dev/wiki/DashboardBuilders#how-to-set-up-a-builder (step 1 is already done; it's this issue).

dmitshur avatar Dec 17 '24 17:12 dmitshur

Thanks for your information and sorry for my late reply.

We are still working on this and fortunately were now able to find and order some hardware with 512MB Memory.

  • MIPS1004Kc dual core 880MHz
  • 512MB Memory

The order is on it's way and as soon as we have the hardware on site, we are going to set it up for the builders. We will be able to provide 10 machines as LUCI builders.

I will let you know if we have any news

stffabi avatar Mar 27 '25 10:03 stffabi

@golang/mips The MIPS builder has been consistently failing since go.dev/cl/675235. May you investigate this? Thanks!

JunyangShao avatar Jun 17 '25 19:06 JunyangShao

Hi @stffabi, may I kindly ask if you have any news on this?


Besides the linux-mips64le-cip-united builder (#67306), CIP United Co. Ltd. offers another server equipped with Loongson 3A4000 (4 cores, mips64r5) and 8GiB RAM for linux-mipsle builder.

Does this hardware fulfill the requirements? If so:

hostname linux-mipsle-cip-united

CSR: linux-mipsle-cip-united.txt

This builder is owned and hosted by CIP United Co. Ltd. (@CIP-United). We prefer the handle to be directly related to our company. If it is mandatory to use my username as the handle, please let us know, and we will regenerate the CSR.

Rongronggg9 avatar Oct 22 '25 03:10 Rongronggg9

@Rongronggg9 Thank you for offering that builder to test the linux/mipsle port. Its hardware should work as long as it provides useful signal. Does all.bash pass on it with GOOS=linux GOARCH=mipsle as is (or prior to CL 675235) or how close is it? These builders will run primarily in post-submit, and there can be multiple builders running with the criteria being that the signal provided should be considered useful by the port maintainers.

That hostname should work well. Here is the corresponding certificate: linux-mipsle-cip-united-1761320084.cert.txt.

dmitshur avatar Oct 24 '25 15:10 dmitshur

The builder is up. It is a systemd-nspawn container.

Rongronggg9 avatar Nov 28 '25 10:11 Rongronggg9

The builder page at https://chromium-swarm.appspot.com/bot?id=linux-mipsle-cip-united--1 shows its current state, and it's visible there that some of the dimensions will need adjustment:

  1. Its "pool" dimension value is currently "unassigned". It's expected to be "luci.golang.shared-workers" instead. I've sent a CL crrev.com/i/8830856 which will resolve this soon.

  2. The "cipd_platform" dimension value is currently "linux-mips". This isn't right for this builder type; we need it to be "linux-mipsle" in order for it to receive appropriately built tools and get assigned to test the GOOS=linux GOARCH=mipsle port. With the current value "linux-mips", it will instead get tools and work for the GOOS=linux GOARCH=mips port. (That particular port also needs a builder, but that's tracked in issue #67303.)

    It's detecting and reporting the cpu dimension as "mips | mips-32 | mips-32-Loongson-3_V0.1__FPU_V0.1". There is a mention of 32 there (in contrast to #67306 which has mips-64) but no "le". The swarming bot itself has the logic to infer its cipd_platform value from the system. It would be optimal to resolve this detection logic inside the swarming bot and get it to report the correct cipd_platform value, if you're able to send a code change to make it properly detect your builder's hardware. (See past examples of changes like this at crrev.com/c/6439429 and crrev.com/c/5178250.)

dmitshur avatar Dec 09 '25 02:12 dmitshur

Quick update here.

The "Host is stuck rebooting for ..." issue and the constantly occurring "bot missing" events are due to a bug in our downstream kernel patch. The patch is required to switch the clock source to a reliable one (64-bit counter value), instead of the MIPS CP0.Counter. The latter has some SMP synchronization issues on the platform and wraps crazily due to its 32-bit counter value and its unnecessarily high frequency.

The bug is located in o32 vDSO, resulting in userspace programs in the o32 ABI always getting the time of the last system time update (settimeofday(), NTP, etc), with a maximum ~3s offset, rather than the real time. The counter value of the "reliable" clock source wraps crazily if only its lower half can be read -- that's the case of o32 vDSO. n32/n64 vDSO, which runs in 64-bit mode, can read the full value, so they are unaffected. Since the swarming bot polls the time to check if a sleep is finished, it just sleeps forever until the next system time update.

We've figured out a fix for the bug (falling back to syscall in the case).

That being said, we met some issues while compiling a new kernel on the same board. There may be some other unrelated hardware failure. We will probably replace some components or even the board if we confirm that this is the case. This may take some time, so please wait.

Rongronggg9 avatar Dec 18 '25 03:12 Rongronggg9

We've successfully built a kernel with the appropriate fix. After applying the kernel fix and other environmental fixes, the mentioned issues in my previous reply are solved, and cipd can be executed correctly.

However, we found ci/gotip-linux-mipsle [^1] constantly failing due to random reasons. This hints that the hardware failure does exist.

To prevent hardware failure from producing excessive CI noise, we've temporarily stopped the builder. We plan to migrate the builder to another board next week. If everything goes right, we will start up the builder again then.

[^1]: TestEnvTZUsage failed in ci/go1.26-linux-mipsle due to empty /etc/localtime. We've fixed this environmental misconfiguration.

Rongronggg9 avatar Dec 19 '25 10:12 Rongronggg9