unblob
unblob copied to clipboard
Build and publish docker images for ARM platforms
I've seen a few reports of people wanting to give unblob a try on AARCH64 targets (mostly Apple M1 or M2) but couldn't because our Docker images are only built for x86 / x86-64.
We should investigate whether or not our current Dockerfile can be built into ARM images. If it works, we should edit the Github CI workflow to publish images for x86, ARMv6, ARMv7, and AARCH64.
Interesting resources on the subject:
- https://docs.docker.com/desktop/multi-arch/
- https://jitsu.com/blog/multi-platform-docker-builds
- https://itnext.io/building-multi-cpu-architecture-docker-images-for-arm-and-x86-3-building-in-github-action-ci-a382feab5af9
Of course I forgot about Hyperscan ! So Hyperscan dev team has no plan on porting it to other platforms than x86. However, there is a fork called VectorScan that adds support for ARM architectures (see https://github.com/VectorCamp/vectorscan).
I'll have a look at VectorScan (how compatible its APIs are, what would be the performance impact) but given the amount of work required it won't be a top priority task. Let's wait for https://github.com/onekey-sec/unblob/pull/244 to land before investing more time in this.
vectorscan does not have a python API which could complicate things. Other somewhat slower solution is to use simple regex on mmaped file, which should be functionally identical and use hyperscan where available.
It am a bit hesitant about trying to do our hand-crafted functionality using the built-in regex engine, I think we'd need multiple passes or would get a regex with too many backtracing. As Vectorscan seems to aiming for API compatibility we could reuse it as a binding and maybe even upstream vectorscan support into it.
I'm with @vlaci on this one
I agree we shouldn't use regex instead of Hyperscan, as even Yara was also too slow. If we throw out Hyperscan, unblob would lose a serious feature; performance. Sure it might be usable even being extremely slow for someone, but I would rather try something else first.
We talked about this with @vlaci, we looked into a bit into the python-hyperscan
package. We could give it a try to build it against vectorscan. If they are API compatible, it might simply work. The only thing we need to patch out is chimera support from python-hyperscan
, because vectorscan doesn't provide that: https://github.com/VectorCamp/vectorscan/blob/master/libhs.pc.in