unblob icon indicating copy to clipboard operation
unblob copied to clipboard

Build and publish docker images for ARM platforms

Open qkaiser opened this issue 2 years ago • 5 comments

I've seen a few reports of people wanting to give unblob a try on AARCH64 targets (mostly Apple M1 or M2) but couldn't because our Docker images are only built for x86 / x86-64.

We should investigate whether or not our current Dockerfile can be built into ARM images. If it works, we should edit the Github CI workflow to publish images for x86, ARMv6, ARMv7, and AARCH64.

Interesting resources on the subject:

  • https://docs.docker.com/desktop/multi-arch/
  • https://jitsu.com/blog/multi-platform-docker-builds
  • https://itnext.io/building-multi-cpu-architecture-docker-images-for-arm-and-x86-3-building-in-github-action-ci-a382feab5af9

qkaiser avatar Aug 09 '22 06:08 qkaiser

Of course I forgot about Hyperscan ! So Hyperscan dev team has no plan on porting it to other platforms than x86. However, there is a fork called VectorScan that adds support for ARM architectures (see https://github.com/VectorCamp/vectorscan).

I'll have a look at VectorScan (how compatible its APIs are, what would be the performance impact) but given the amount of work required it won't be a top priority task. Let's wait for https://github.com/onekey-sec/unblob/pull/244 to land before investing more time in this.

qkaiser avatar Aug 09 '22 07:08 qkaiser

vectorscan does not have a python API which could complicate things. Other somewhat slower solution is to use simple regex on mmaped file, which should be functionally identical and use hyperscan where available.

martonilles avatar Aug 09 '22 08:08 martonilles

It am a bit hesitant about trying to do our hand-crafted functionality using the built-in regex engine, I think we'd need multiple passes or would get a regex with too many backtracing. As Vectorscan seems to aiming for API compatibility we could reuse it as a binding and maybe even upstream vectorscan support into it.

vlaci avatar Aug 10 '22 12:08 vlaci

I'm with @vlaci on this one

qkaiser avatar Aug 10 '22 13:08 qkaiser

I agree we shouldn't use regex instead of Hyperscan, as even Yara was also too slow. If we throw out Hyperscan, unblob would lose a serious feature; performance. Sure it might be usable even being extremely slow for someone, but I would rather try something else first.

We talked about this with @vlaci, we looked into a bit into the python-hyperscan package. We could give it a try to build it against vectorscan. If they are API compatible, it might simply work. The only thing we need to patch out is chimera support from python-hyperscan, because vectorscan doesn't provide that: https://github.com/VectorCamp/vectorscan/blob/master/libhs.pc.in

kissgyorgy avatar Aug 10 '22 19:08 kissgyorgy