LekKit
LekKit
Implemented Zbc (interpreter only, probably no JIT planned) in 08094c5 Now the entire Bitmanip family is supported in the interpreter
Implemented Zcb with partial JIT support in 1f41839 TODO: Test this properly
Implemented Zicond (interpreter only for now) in fc406a9 TODO: Test this properly
Overview on possible Zawrs implementation: It is highly similar to x86 `monitor`/`mwait` instructions, however those are usually only usable in ring 0. Some AMD chips (Starting from Bulldozer?) have `monitorx`/`mwaitx`...
There are scalar crypto extensions that are extending atop Bitmanip. It might make sense to implement them; altho my initial evaluation of JITability is fairly low unless we just start...
Implemented Zkr (entropy source CSR).
Overview on how new extensions could be JITed: [godbolt link](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIApACYAQuYukl9ZATwDKjdAGFUtAK4sGIM6SuADJ4DJgAcj4ARpjEEgDsAKykAA6oCoRODB7evv6p6ZkCIWGRLDFxXEm2mPaOAkIETMQEOT5%2BAXaYDlmNzQQlEdGxCckKTS1teZ0TA6FD5SNViQCUtqhexMjsHOYAzKHI3lgA1CZ7buP4ggB0COfYJhoAgk/PAPTvJwBaUUxvby8oQIADZJAB9AgnJjodDgrwAdwgQMEYMh0NIJxRBD2ZnRURWAPiVheJzJJ2ImAImwY0LOlhOUGxaIIKwJ5xJr3iABEAS9sbj0QoEFwYehkcDBVCmJiBXioeyXiZiW9yRSqTS6dZGVEzhdzm4TlxCXtOcreUr%2BcCWSdhaLYfCkcyIdLZZL5SdFVzOWrKdTiLSmPSLIymdaXWyVnq3AajSazTy%2Bc85UKEGYxRLBFKMVj3fjCUqVaTyX7NUHtRBdQbY2Z40SLa8raiXba02LHZnQS2Zbmsx6vcqfSWNQGtQyoJ2WZHozW64WG4C81DhXsMynXb2cf2C97VcP/YHgzqZxcTns51yF02u6nVw7EZPu26%2B/miUOyaXR%2BXx2Hm5Dp9Wp7nhy9ZJs6Qq0LQeAduu0I7oOe5kp8JwWEwGSiJBACetqQXgCInAihAIBsy54MADAALSYKoBCMBkAiIeqB5avqp5cBoIHzmB4bogAXtRBDgk6S5wW%2BjGfoGnGXkmyG/FEMlfAAKsRSgnOgAhgBwUIIEwABumDQgw2EKCk3R4GIJyhOMxBeD0AgKCcAgnKoAAcIKLi%2B0oMOgDCdtmPawQORbPL6I6HuYIInAAfvJpqgZayYiSQvmwQFIlBe%2BTFlmc8SGjFUnmtxnnOQwJB%2BR6aXFRl4lhVqiTYNFsUJleiXFUcvG4uV6JMPBwVqngVCMmAYA9Vlo64lJoXMeC4JRECtCOAw4LtRAPUFYmCWwe1YKPt1vWZQNQ0jVGEknGCk37pqM1zXQi3LbQvGQatF6FZtIkOB1ZhddK%2B2MYdEDDaNp0TXFxYfrV13zXdH3PetLVbQQvE7eBP1iWDlmDQDx1jbS52gyFl2jpDt2hMtiNPWt%2BOvY2HxfMoqApGgXiCCcOkOaVJzMI4%2BmWQwCi0k5wCMLEeDIM5bngudLzIYomAOQoqBsARmB4MQ6AnEIACSACy3K2tSVCDU5zQsFLrVbuiyBpCknWpb96OncTC2k9bTOCLDVMbTTW3W8jPGo4WmWO7NUMuwzbsEBTL1e28yHKekBnNAZXgZEYJzICwqC6ZiLBMNh2gp1CKcGULYTEKLJy0KgCIi0YHkW1CueqBAIk9ul9sE%2BDzGrXqDUEnqABidIgJ6cNFQ3JxN14305oFHdTZqPcPJ6UbnEPQYj018XeyJLChC3xVt1V8%2BE7SS%2Bnv3a/D6Pnvw7voTT6lz4T9VDu1efhqX3s68nJvY9vX%2BZcAlZozzWoHGq3dgQuX/P8W%2B48bRKBouCBAoCT5d0XsCLgIIYH/xpshZ4tAFYnE1gAJUMvgCkGhMTEC4JiDQqgB6MIHvXbM/EkEoOxFgvaaNO440ZJw7BrJYHNQUicZSeAHIWDkEEIIQgAASmtFIESIicGgqhMDq2IKgJo9RAyZ2ZlCSkwAJG0WIIyNwQQVg3GsbHOmWi/hRFoNhJmtBNGYGLq2AaBAKJREwhRIxldq612ALzcWkU/jIAANauBYR6LRtBQHP2zK/Xhp0P6xkrPSSK9CuBUBWKvXKjJyx7EeCU0MuITgURXlkk4OSB75NwbE9E8S8CoJ4QvUc8TVqYnYtHFqskogKFEc8BwXgxBOMCTXYgDkYgEFMScIgtpiC0HeAoZZeAjyGHwE0hUAkZ66TEEk/shA0F8IgAc2gvdl5RBOTUrgjTrzJIEq0lGJwLmnLSRcq5ZTJAaFXmYSK9y4EJWQuEJg3MDJRAIAoTEUKyCegIGLKyBAbJ2T5o5Wkrl3LSy%2BOEVA6oUgGG2GwFmxtiCm0kCcai2wUhQn4GYvALAWBvNiPRPmOzPRKAIPsw5m5kknPaafN5FllSGggFwOQJ4xU3KhBFWpqguD1L6ePZJRxiA8toEc/EArwFv2Yl8uVUVxWSsAtKwgNS6kNOBTvKqoRdIaq1QqHVu49WagNfVRkEqpU6nNXKy1yqAE3gVFyl5/thW0A%2BbVA1hTjUyO9b8gNNqg2ejVaGwB4bI36pFQC6Ksagjxr%2BQ882NobkMF0mm5N7zBXoNHO6hqeaC2JtsT8UWAh0CiNlg5DOWcMVhMxMgOwvaTZmy2vxLR4Iwi8USXy7c1a%2BFVm/rU3%2B0Ii2jtiKgcEmAACOU6n4ztfLq1JtUF0/xHhxW%2BHA1i0E4IkXgfgOBaFIKgTgMZLDagVpsbY9I9g8FIAQTQl61iRJAIkKh16OCSDvQBp9nBeAKBAFQ/9D7L2kDgLAJAaAWApDoLEcglBMPYfoHEYA/MmApGFNovgt1WWUCiNB0tzRMKcF/Qx4gmEADyURtDdCQ7%2BzDJKCDsYYE46DWA5rADcOM%2BD3BeBYFzkYcQyHSD4EpD0fS0nH3UW6F4WizHeDAlqNBqCURiCMY8FgaDKLGV6bWFQAwJGABqeBMAInY6Ze9v7%2BCCBEGIdgUgZCCFlmoaDuhaEGCMCgaw1h9B4EGfANYDNdHSYouxswvAs6xHLlgeDkA1hdDRS4byUw/C0OCPMMoFQ9BpAyLo4rVXCi6MGBVpYNQ6i9FmHV2h%2BXdF9BaE14YlRbAdc8O0PQ4x%2Bj9cWJUPLGwth%2BavTeqDSnn0cDCRRMEJxgDIDFhAFFzNIlRggLgQgJBv3Gl4EhrQ%2BTSDAdA/oTgkHSD3sfStuDCG/0Aeu%2BBtLT3oOvY%2B8h67%2BlplZBAJIIAA%3D%3D%3D) TLDR: - Zba fairly well compiles to `lea rd, [rs2 + rs1 * 2]` variations on x86 (2 insns are...
Optimized `orc.b` instruction implementation (used in interpreter) in f760ee2. This could be also inlined in JIT. This instruction is heavily used to accelerate string operations, so having a fast implementation...
Probably the best possible `orc.b` implementation for x86_64: 6a37001 A similar implementation is probably possible on ARM64 using vceqq_u8 instrinsic UPD: ARM64 neon implementation 3563cbf ```cpp static inline uint64_t bit_orc_b(uint64_t...
Afaik `pstore` is CFI flash device, which I've looked into but never finished it, I'll check on that. Did you try building and running a QEMU virt firmware?