libaums
libaums copied to clipboard
Ci instrumented tests
This introduces QEMU tests on GitHub Actions. Since GH Actions doesn't support nested virtualization but Travis does, but also Travis doesn't support multiple workflows, I would convert the existing travis..yml to GH Actions and this to Travis.
To get over all of the Android image shenanigans such as "does not connect to the emulated network" or "no way to auto-approve USB permissions" I'm pulling in two other projects. I added them to the EtchDroid organization but let me know if you have a better place for them (if you like them at all):
- https://github.com/EtchDroid/qemu_test_orchestrator
- The name should be pretty clear, it runs VM, applies all the workarounds and runs the tests inside of it, while approving permission requests
- https://github.com/EtchDroid/VirtWifiConnector/
- All "usable" Android-x86 images except for Marshmallow do not connect to the network automatically. They instead show the emulated ethernet as "VirtWifi" and they do not connect to it automatically. It turns out there's no easy way to connect to wifi from the command line, so the orchestrator above will take an APK of this small helper and shove it into QEMU over the emulated serial
I followed this approach in order to be able to use upstream, clean Android-x86 images. I wanted to avoid having to build a purpose-made image. It works on all images for which an RPM package is provided (API 23, 25, 27, 28).
I'm not sure how to test this without merging it into develop, maybe later I'll try making it run for pull requests from all branches.
Codecov Report
Merging #264 into develop will not change coverage. The diff coverage is
n/a.
@@ Coverage Diff @@
## develop #264 +/- ##
==========================================
Coverage 62.70% 62.70%
Complexity 365 365
==========================================
Files 49 49
Lines 1582 1582
Branches 217 217
==========================================
Hits 992 992
Misses 525 525
Partials 65 65
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 9eff61a...d586f07. Read the comment docs.
I tried hard to make it work without KVM but as you can see, Android isn't able to start in 5 minutes (look at the screen recording for API 23, "Download artifacts" above the GitHub actions latest workflow log - it doesn't even do mode setting). The entire test suite runs within 5 minutes on my machine and Android boots within 20 seconds.
So, I'd definitely move this to another CI, but I need your intervention in any case:
- for Travis, I'd need you to move the jacoco report to the GH actions workflow and add the codecov tokens
- for any other CI, I'd need you to add the repo to it
However, the only CI known to support nested virtualization is Travis, so the easiest option would be to move it there. It looks like MacOS jobs on all CIs support virtualization but that would have other issues such as "i don't have/want a Mac to test it" and "IDK how to use MacOS"
So, could you please move the code coverage reports to gh actions? Or point out alternatives if you have any :)
I'll try to implement tests on Travis on my libaums_wrapper repo as a demo, since i want to do additional block device testing anyway using my Stream reader/Writer implementations.
I can definitely move the codecov stuff to GH Actions, but how would that solve the problem?
Are we using then GH Actions for unit tests and codecov and travis for qemu stuff?
I can definitely move the codecov stuff to GH Actions, but how would that solve the problem?
I'd rather have each workflow do one thing, especially since the QEMU test is quite lengthy, complicated and therefore more prone to break/need tweaking. The codecov thing does just one thing without special requirements, so I'd move it to GH Actions so it can run on its own without risking that changes to the qemu workflow break it.
Are we using then GH Actions for unit tests and codecov and travis for qemu stuff?
Yep, that's my idea
I can definitely move the codecov stuff to GH Actions, but how would that solve the problem?
I'd rather have each workflow do one thing, especially since the QEMU test is quite lengthy, complicated and therefore more prone to break/need tweaking. The codecov thing does just one thing without special requirements, so I'd move it to GH Actions so it can run on its own without risking that changes to the qemu workflow break it.
Are we using then GH Actions for unit tests and codecov and travis for qemu stuff?
Yep, that's my idea
Sounds great! I just added the codecov stuff to GH actions and it seems to work w/o a token. Seems that the bash script is smart enough to handle that by itself.
So feel free to remove the unit tests from travis and only do the QEMU stuff there.
As you can see, it's "sort of" working now. At least it boots and in some cases, it even runs the tests (the tests fail though :cry:) It still takes a lot, ~10 minutes for the installation and some Android versions take up to 13 minutes (!!!) to boot to the launcher screen, even with KVM.
I improved the coordinator script, it now connects to the VM's serial port over Unix socket and instead of blindly typing stuff and hope it works, it actually checks the output to some extent. Therefore it should avoid wasting time on hardcoded sleeps but also wait longer if the build servers are overloaded.
I'm uploading screen recordings and logcats here, it's publicly viewable: https://objstor.depau.eu/minio/libaums-screenrecs/travis/
Hopefully after some extra troubleshooting it will work reliably.
I will squash all the commits into one before this will be good to merge, I'm really just trying stuff and see how it works, on my machine it works very reliably and the whole test suite runs within 30 seconds, the problem is really just the CI servers.
Hmm I see, do you think it may be worth a shot to reach out to the travis ci support?
Or maybe setting up a dedicated Jenkins instance on a VM somewhere?
Or maybe setting up a dedicated Jenkins instance on a VM somewhere?
I think getting a VM more powerful than Travis's that also supports nested KVM is gonna be quite expensive, but that could be an option.
The VMs Travis offers for travis.org are really not that bad, with 7.5 GB of RAM, 2 cores at reportedly 2.8GHz. The problems arise (just guessing) since I think they're running both cores on the same real CPU core (with hyperthreading). This is quite likely since it is also a mitigation for some of the recent side-channel attack vulnerabilities found in Intel CPUs.
On top of that, I also think they also have other load on the VMs.
Hmm I see, do you think it may be worth a shot to reach out to the travis ci support?
I actually narrowed down the issue. The VM was struggling because I was sending the "virtwifi enabler" APK while it was running dex2oat, which uses a lot of CPU.
I couldn't reproduce it on my machine since my CPU clock is quite higher than travis' and I get no issues whatsoever, so I'm forced to test it on Travis (sorry for the failed jobs spam).
I see that the tests pretty much never fail on Android 9 and 8.
Since this has been laying around for quite a while and I haven't had the chance to do more troubleshooting, I was thinking I could disable tests on other platforms for now, merge it but then keep an eye on it in case it fails when it shouldn't.
I can then pick it up later and fix the other versions.
What do you think?
Hey,
sorry for the late response. Yes sounds great, feel free to merge :)
Although it seems that travis stopped free builds for OSS prijects... https://news.ycombinator.com/item?id=25338983
Although it seems that travis stopped free builds for OSS prijects... https://news.ycombinator.com/item?id=25338983

Although it seems that travis stopped free builds for OSS prijects... https://news.ycombinator.com/item?id=25338983
I'll see if GitLab CI or hosted Drone.io have nested KVM enabled :( requesting the CI credits every month doesn't sound like a good option.
If that's not the case at some point next year I'm going to put back online my private CI, which will either be Drone once again or I may switch to builds.sr.ht. I can install the runner on my desktop on which nested KVM is definitely enabled and there definitely won't be any issues with RAM shortage or race conditions.
I don't leave it on constantly, though, so if we can find a decent public CI that would be better (also I pay for electricity :man_shrugging:)
It looks like drone doesn't have KVM loaded, but maybe we can ask them to load the kernel module since otherwise they have some very nice hardware: https://cloud.drone.io/Depau/drone-test/2/2/2
+ virt-host-validate || true
QEMU: Checking for hardware virtualization : PASS
QEMU: Checking if device /dev/kvm exists : FAIL (Check that the 'kvm-intel' or 'kvm-amd' modules are loaded & the BIOS has enabled virtualization)
QEMU: Checking if device /dev/vhost-net exists : WARN (Load the 'vhost_net' module to improve performance of virtio networking)
QEMU: Checking if device /dev/net/tun exists : FAIL (Load the 'tun' module to enable networking for QEMU guests)
QEMU: Checking for cgroup 'cpu' controller support : PASS
QEMU: Checking for cgroup 'cpuacct' controller support : PASS
QEMU: Checking for cgroup 'cpuset' controller support : PASS
QEMU: Checking for cgroup 'memory' controller support : PASS
QEMU: Checking for cgroup 'devices' controller support : PASS
QEMU: Checking for cgroup 'blkio' controller support : PASS
QEMU: Checking for device assignment IOMMU support : WARN (No ACPI IVRS table found, IOMMU either disabled in BIOS or not supported by this hardware platform)
QEMU: Checking for secure guest support : WARN (Unknown if this platform has Secure Guest support)
LXC: Checking for Linux >= 2.6.26 : PASS
LXC: Checking for namespace ipc : PASS
LXC: Checking for namespace mnt : PASS
LXC: Checking for namespace pid : PASS
LXC: Checking for namespace uts : PASS
LXC: Checking for namespace net : PASS
LXC: Checking for namespace user : PASS
LXC: Checking for cgroup 'cpu' controller support : PASS
LXC: Checking for cgroup 'cpuacct' controller support : PASS
LXC: Checking for cgroup 'cpuset' controller support : PASS
LXC: Checking for cgroup 'memory' controller support : PASS
LXC: Checking for cgroup 'devices' controller support : PASS
LXC: Checking for cgroup 'freezer' controller support : PASS
LXC: Checking for cgroup 'blkio' controller support : PASS
LXC: Checking if device /sys/fs/fuse/connections exists : PASS
+ lscpu || true
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
NUMA node(s): 4
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7401P 24-Core Processor
Stepping: 2
CPU MHz: 2791.180
BogoMIPS: 3992.40
Virtualization: AMD-V
L1d cache: 768 KiB
L1i cache: 1.5 MiB
L2 cache: 12 MiB
L3 cache: 64 MiB
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28,32,36,40,44
NUMA node1 CPU(s): 1,5,9,13,17,21,25,29,33,37,41,45
NUMA node2 CPU(s): 2,6,10,14,18,22,26,30,34,38,42,46
NUMA node3 CPU(s): 3,7,11,15,19,23,27,31,35,39,43,47
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
Well I guess we could close this since neither of us put any effort to work around the issues with the CIs etc. and I'm not really looking forward to resuscitate this effort either.
By the way, I recently found out how to create virtual USB drives with the official emulator as well:
~/Android/Sdk/emulator/emulator -avd Pixel_5_API_33 -qemu \
-monitor unix:qemu-monitor-socket,server,nowait \
-usb -device nec-usb-xhci,id=xhci \
-blockdev node-name=stick,driver=raw,file.driver=file,file.node-name=file,file.filename=/home/depau/usb-storage.img \
-device usb-storage,bus=xhci.0,drive=stick,id=usbstick
This boots the emulator with a virtual USB drive plugged in, and it also exposes the QEMU monitor socket from which you can disconnect it and reconnect it.
socat -,echo=0,icanon=0 unix-connect:qemu-monitor-socket