libaums icon indicating copy to clipboard operation
libaums copied to clipboard

Ci instrumented tests

Open depau opened this issue 5 years ago • 16 comments

This introduces QEMU tests on GitHub Actions. Since GH Actions doesn't support nested virtualization but Travis does, but also Travis doesn't support multiple workflows, I would convert the existing travis..yml to GH Actions and this to Travis.

To get over all of the Android image shenanigans such as "does not connect to the emulated network" or "no way to auto-approve USB permissions" I'm pulling in two other projects. I added them to the EtchDroid organization but let me know if you have a better place for them (if you like them at all):

  • https://github.com/EtchDroid/qemu_test_orchestrator
    • The name should be pretty clear, it runs VM, applies all the workarounds and runs the tests inside of it, while approving permission requests
  • https://github.com/EtchDroid/VirtWifiConnector/
    • All "usable" Android-x86 images except for Marshmallow do not connect to the network automatically. They instead show the emulated ethernet as "VirtWifi" and they do not connect to it automatically. It turns out there's no easy way to connect to wifi from the command line, so the orchestrator above will take an APK of this small helper and shove it into QEMU over the emulated serial

I followed this approach in order to be able to use upstream, clean Android-x86 images. I wanted to avoid having to build a purpose-made image. It works on all images for which an RPM package is provided (API 23, 25, 27, 28).

depau avatar Aug 08 '20 06:08 depau

I'm not sure how to test this without merging it into develop, maybe later I'll try making it run for pull requests from all branches.

depau avatar Aug 08 '20 06:08 depau

Codecov Report

Merging #264 into develop will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #264   +/-   ##
==========================================
  Coverage      62.70%   62.70%           
  Complexity       365      365           
==========================================
  Files             49       49           
  Lines           1582     1582           
  Branches         217      217           
==========================================
  Hits             992      992           
  Misses           525      525           
  Partials          65       65           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9eff61a...d586f07. Read the comment docs.

codecov[bot] avatar Aug 08 '20 07:08 codecov[bot]

I tried hard to make it work without KVM but as you can see, Android isn't able to start in 5 minutes (look at the screen recording for API 23, "Download artifacts" above the GitHub actions latest workflow log - it doesn't even do mode setting). The entire test suite runs within 5 minutes on my machine and Android boots within 20 seconds.

So, I'd definitely move this to another CI, but I need your intervention in any case:

  • for Travis, I'd need you to move the jacoco report to the GH actions workflow and add the codecov tokens
  • for any other CI, I'd need you to add the repo to it

However, the only CI known to support nested virtualization is Travis, so the easiest option would be to move it there. It looks like MacOS jobs on all CIs support virtualization but that would have other issues such as "i don't have/want a Mac to test it" and "IDK how to use MacOS"

So, could you please move the code coverage reports to gh actions? Or point out alternatives if you have any :)

I'll try to implement tests on Travis on my libaums_wrapper repo as a demo, since i want to do additional block device testing anyway using my Stream reader/Writer implementations.

depau avatar Aug 09 '20 23:08 depau

I can definitely move the codecov stuff to GH Actions, but how would that solve the problem?

Are we using then GH Actions for unit tests and codecov and travis for qemu stuff?

magnusja avatar Aug 10 '20 10:08 magnusja

I can definitely move the codecov stuff to GH Actions, but how would that solve the problem?

I'd rather have each workflow do one thing, especially since the QEMU test is quite lengthy, complicated and therefore more prone to break/need tweaking. The codecov thing does just one thing without special requirements, so I'd move it to GH Actions so it can run on its own without risking that changes to the qemu workflow break it.

Are we using then GH Actions for unit tests and codecov and travis for qemu stuff?

Yep, that's my idea

depau avatar Aug 10 '20 10:08 depau

I can definitely move the codecov stuff to GH Actions, but how would that solve the problem?

I'd rather have each workflow do one thing, especially since the QEMU test is quite lengthy, complicated and therefore more prone to break/need tweaking. The codecov thing does just one thing without special requirements, so I'd move it to GH Actions so it can run on its own without risking that changes to the qemu workflow break it.

Are we using then GH Actions for unit tests and codecov and travis for qemu stuff?

Yep, that's my idea

Sounds great! I just added the codecov stuff to GH actions and it seems to work w/o a token. Seems that the bash script is smart enough to handle that by itself.

magnusja avatar Aug 10 '20 11:08 magnusja

So feel free to remove the unit tests from travis and only do the QEMU stuff there.

magnusja avatar Aug 10 '20 11:08 magnusja

As you can see, it's "sort of" working now. At least it boots and in some cases, it even runs the tests (the tests fail though :cry:) It still takes a lot, ~10 minutes for the installation and some Android versions take up to 13 minutes (!!!) to boot to the launcher screen, even with KVM.

I improved the coordinator script, it now connects to the VM's serial port over Unix socket and instead of blindly typing stuff and hope it works, it actually checks the output to some extent. Therefore it should avoid wasting time on hardcoded sleeps but also wait longer if the build servers are overloaded.

I'm uploading screen recordings and logcats here, it's publicly viewable: https://objstor.depau.eu/minio/libaums-screenrecs/travis/

Hopefully after some extra troubleshooting it will work reliably.

I will squash all the commits into one before this will be good to merge, I'm really just trying stuff and see how it works, on my machine it works very reliably and the whole test suite runs within 30 seconds, the problem is really just the CI servers.

depau avatar Sep 14 '20 22:09 depau

Hmm I see, do you think it may be worth a shot to reach out to the travis ci support?

Or maybe setting up a dedicated Jenkins instance on a VM somewhere?

magnusja avatar Sep 17 '20 13:09 magnusja

Or maybe setting up a dedicated Jenkins instance on a VM somewhere?

I think getting a VM more powerful than Travis's that also supports nested KVM is gonna be quite expensive, but that could be an option.

The VMs Travis offers for travis.org are really not that bad, with 7.5 GB of RAM, 2 cores at reportedly 2.8GHz. The problems arise (just guessing) since I think they're running both cores on the same real CPU core (with hyperthreading). This is quite likely since it is also a mitigation for some of the recent side-channel attack vulnerabilities found in Intel CPUs.

On top of that, I also think they also have other load on the VMs.

Hmm I see, do you think it may be worth a shot to reach out to the travis ci support?

I actually narrowed down the issue. The VM was struggling because I was sending the "virtwifi enabler" APK while it was running dex2oat, which uses a lot of CPU.

I couldn't reproduce it on my machine since my CPU clock is quite higher than travis' and I get no issues whatsoever, so I'm forced to test it on Travis (sorry for the failed jobs spam).

depau avatar Sep 18 '20 00:09 depau

I see that the tests pretty much never fail on Android 9 and 8.

Since this has been laying around for quite a while and I haven't had the chance to do more troubleshooting, I was thinking I could disable tests on other platforms for now, merge it but then keep an eye on it in case it fails when it shouldn't.

I can then pick it up later and fix the other versions.

What do you think?

depau avatar Nov 28 '20 15:11 depau

Hey,

sorry for the late response. Yes sounds great, feel free to merge :)

magnusja avatar Dec 08 '20 09:12 magnusja

Although it seems that travis stopped free builds for OSS prijects... https://news.ycombinator.com/item?id=25338983

magnusja avatar Dec 08 '20 09:12 magnusja

Although it seems that travis stopped free builds for OSS prijects... https://news.ycombinator.com/item?id=25338983

tenor

depau avatar Dec 08 '20 18:12 depau

Although it seems that travis stopped free builds for OSS prijects... https://news.ycombinator.com/item?id=25338983

tenor

I'll see if GitLab CI or hosted Drone.io have nested KVM enabled :( requesting the CI credits every month doesn't sound like a good option.

If that's not the case at some point next year I'm going to put back online my private CI, which will either be Drone once again or I may switch to builds.sr.ht. I can install the runner on my desktop on which nested KVM is definitely enabled and there definitely won't be any issues with RAM shortage or race conditions.

I don't leave it on constantly, though, so if we can find a decent public CI that would be better (also I pay for electricity :man_shrugging:)

depau avatar Dec 08 '20 18:12 depau

It looks like drone doesn't have KVM loaded, but maybe we can ask them to load the kernel module since otherwise they have some very nice hardware: https://cloud.drone.io/Depau/drone-test/2/2/2

+ virt-host-validate || true
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking if device /dev/kvm exists                                   : FAIL (Check that the 'kvm-intel' or 'kvm-amd' modules are loaded & the BIOS has enabled virtualization)
  QEMU: Checking if device /dev/vhost-net exists                             : WARN (Load the 'vhost_net' module to improve performance of virtio networking)
  QEMU: Checking if device /dev/net/tun exists                               : FAIL (Load the 'tun' module to enable networking for QEMU guests)
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : WARN (No ACPI IVRS table found, IOMMU either disabled in BIOS or not supported by this hardware platform)
  QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)
   LXC: Checking for Linux >= 2.6.26                                         : PASS
   LXC: Checking for namespace ipc                                           : PASS
   LXC: Checking for namespace mnt                                           : PASS
   LXC: Checking for namespace pid                                           : PASS
   LXC: Checking for namespace uts                                           : PASS
   LXC: Checking for namespace net                                           : PASS
   LXC: Checking for namespace user                                          : PASS
   LXC: Checking for cgroup 'cpu' controller support                         : PASS
   LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
   LXC: Checking for cgroup 'cpuset' controller support                      : PASS
   LXC: Checking for cgroup 'memory' controller support                      : PASS
   LXC: Checking for cgroup 'devices' controller support                     : PASS
   LXC: Checking for cgroup 'freezer' controller support                     : PASS
   LXC: Checking for cgroup 'blkio' controller support                       : PASS
   LXC: Checking if device /sys/fs/fuse/connections exists                   : PASS
+ lscpu || true
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          48
On-line CPU(s) list:             0-47
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       1
NUMA node(s):                    4
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           1
Model name:                      AMD EPYC 7401P 24-Core Processor
Stepping:                        2
CPU MHz:                         2791.180
BogoMIPS:                        3992.40
Virtualization:                  AMD-V
L1d cache:                       768 KiB
L1i cache:                       1.5 MiB
L2 cache:                        12 MiB
L3 cache:                        64 MiB
NUMA node0 CPU(s):               0,4,8,12,16,20,24,28,32,36,40,44
NUMA node1 CPU(s):               1,5,9,13,17,21,25,29,33,37,41,45
NUMA node2 CPU(s):               2,6,10,14,18,22,26,30,34,38,42,46
NUMA node3 CPU(s):               3,7,11,15,19,23,27,31,35,39,43,47
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

depau avatar Dec 18 '20 07:12 depau

Well I guess we could close this since neither of us put any effort to work around the issues with the CIs etc. and I'm not really looking forward to resuscitate this effort either.

By the way, I recently found out how to create virtual USB drives with the official emulator as well:

~/Android/Sdk/emulator/emulator -avd Pixel_5_API_33 -qemu \
  -monitor unix:qemu-monitor-socket,server,nowait \
  -usb -device nec-usb-xhci,id=xhci \
  -blockdev node-name=stick,driver=raw,file.driver=file,file.node-name=file,file.filename=/home/depau/usb-storage.img \
  -device usb-storage,bus=xhci.0,drive=stick,id=usbstick

This boots the emulator with a virtual USB drive plugged in, and it also exposes the QEMU monitor socket from which you can disconnect it and reconnect it.

socat -,echo=0,icanon=0 unix-connect:qemu-monitor-socket

depau avatar Mar 15 '23 20:03 depau