PatternLanguage
PatternLanguage copied to clipboard
Crash when calculating addressof(self)
I have a recursive data structure which contains (among other things) a field for the size of the embedded array; reading that array consists of reading elements from the start address of the array until the index pointer matches the data size. Here’s a minimized example of what seemed like the natural way to achieve this:
struct Element {
u8 child_len;
Element children[while($ != addressof(children) + child_len)];
};
Element root @0;
However, with the example data 05 01 00 02 00 00
, this reliably crashes in ImHex 0.33.1, and also seems to crash (at least it produces no data) on the current version of pl.werwolv.net.
Workaround: use addressof+sizeof
the previous field, but this looks clunky and is hard to keep track of when there are several fields in play:
struct Element {
u8 child_len;
Element children[while($ != addressof(child_len) + sizeof(child_len) + child_len)];
};
The main topic of this issue is the crash of ImHex which should never happen. ImHex 1.33.0 had some stability issues that have been fixed in 1.33.1. You stated that you tested 0.33.1 which looks like a typo but I can't tell which version you meant. To test the latest code download a nightly build from werwolv.net (at the bottom).
Although not the main topic, it seems like you also believe that the simplified code you posted should work and it doesn't. The problem is not the size of the array, but the fact that your recursion never stops. even if the size of the array is zero the recursion will not stop, so the workaround you posted will not stop recursion from reaching its safety limit.. In order to stop the recursion you need to use flow control statements in the recursive pattern, that is, if statements with bodies containing break, continue, ...
In general writing recursive patterns is not easy so I usually create an explicit recursion level variable which is printed along relevant data. I wrote a small example based on your post that limits recursion in two ways. If the size condition is met then no more elements are added to the array and recursion is stopped. If child_len is zero then the array creation is skipped but recursion is allowed to continue.
The recursion must first be initialized which usually means writing initial values when recursion equals 1. The Element placed at address 0 has no parent, so we only check the parent data if recursion is greater than one. To effectively stop recursion, the recursive statement must be located inside an if block containing the flow control statements and occur after the control statements.
This code is just a suggestion of how recursion can be used. I don't know the details of your usage case so I can't write code that you can just copy, but it should help check possible variations
#pragma example 05 04 03 02 01 00 00 00 00 00 00 00 00 00 00 00
#include "std/io.pat"
#include "std/mem.pat"
u32 recursion=0;
struct Element {
recursion +=1;
u8 child_len;
u64 save = $;
if (recursion == 1)
Element children[while(true)];
else if (recursion > 1) {
std::print("Save: {:#x},Len: {},$: {:#x},Recursion: {}",parent.save,parent.child_len,$,recursion);
if ($>parent.save+parent.child_len)
break;
if (child_len==0)
continue;
Element children[while(true)];
}
recursion -=1;
};
Element element@0;
it produces the following output on the example input
I: Save: 0x1,Len: 5,$: 0x2,Recursion: 2
I: Save: 0x2,Len: 4,$: 0x3,Recursion: 3
I: Save: 0x3,Len: 3,$: 0x4,Recursion: 4
I: Save: 0x4,Len: 2,$: 0x5,Recursion: 5
I: Save: 0x5,Len: 1,$: 0x6,Recursion: 6
I: Save: 0x5,Len: 1,$: 0x7,Recursion: 6
I: Save: 0x4,Len: 2,$: 0x8,Recursion: 5
I: Save: 0x3,Len: 3,$: 0x9,Recursion: 4
I: Save: 0x2,Len: 4,$: 0xa,Recursion: 3
I: Save: 0x1,Len: 5,$: 0xb,Recursion: 2
I: Pattern exited with code: 0
I: Evaluation took 0.0072071s
The main topic of this issue is the crash of ImHex which should never happen.
Agreed, it would be nice if addressof(children)
worked in this expression, but I can understand why it wouldn't given that the field hasn't been fully declared yet. But even if it doesn't I'd still hope for an error message rather than a crash.
You stated that you tested 0.33.1 which looks like a typo but I can't tell which version you meant.
Whoops, yes, it was 1.33.1
that I'm using. (The Ubuntu 22.04 DEB, for completeness.)
To test the latest code download a nightly build from werwolv.net (at the bottom).
I just tried downloading the nightly but that actually claims to be older:
$ zipinfo ~/Downloads/Ubuntu\ 22.04\ DEB\ x86_64.zip
Archive: /home/wolf/Downloads/Ubuntu 22.04 DEB x86_64.zip
Zip file size: 132817991 bytes, number of entries: 1
-rw-r--r-- 4.5 unx 134121976 bl defN 24-Mar-03 15:48 imhex-1.33.0-Ubuntu-22.04-x86_64.deb
1 file, 134121976 bytes uncompressed, 132817805 bytes compressed: 1.0%
it seems like you also believe that the simplified code you posted should work and it doesn't.
Erm, what? The workaround I posted above seems to work fine, you can run it in ImHex yourself, it reads the tree as 5(1(0()) 2(0() 0())
exactly the way I'd expect:
your recursion never stops. even if the size of the array is zero the recursion will not stop
As seen above, it seems to stop fine. In particular, in the case of size zero, $ == base + 0
is immediately (and vacuously) true.
In order to stop the recursion you need to use flow control statements in the recursive pattern, that is, if statements with bodies containing break, continue, ...
From reading the array type docs, I was under the impression that breaking at the right point while iterating over data was exactly what the loop-sized array syntax was for. Is that not true, and if so, what is it for?
a small example based on your post that limits recursion in two ways
You sound like you know the language better than I do, but the example code you provide seems to be (a) very complicated, and (b) buggy, or at least doing something different than I want: when I run it on my example data I get back a different structure than the one my code produces, though I haven't dug into why. (Possibly worth noting to clarify in case you got confused: the child_len
here is the size in bytes of the array of children, not the expected number of elements. The format is designed so that readers who don't care about the children of a specific element can skip over it with a pointer bump and not have to iterate through the elements to find the end of the array, though I don't care about doing that for the purposes of this script.)
Hm, it seems like my original crash is likely a symptom of a more general problem with referencing (currently) unknown variables/fields inside of a loop-sized array expression; the following even-more-minimized example also reliably crashes ImHex 1.33.1:
u8 foo[while(bar)] @0;
Interestingly, I also just thought to try it on https://web.imhex.werwolv.net/ and that actually shows a reasonable error message, though I also see some weirdly varying binary-ish garbage in the output.
well from my poking around in gdb & valgrind it seems like the cause is the errors location (as in where it occurred) being partially allocated, causing a use after free crash.
valgrind reports the location being free'd when a ASTNodeMathematicalExpression
instance goes out of scope in pl::core::ast::ASTNodeArrayVariableDecl::createStaticArray(pl::core::Evaluator*) const
(ast_node_array_variable_decl.cpp:197
)
I debugged this error and found that the way location is being obtained in pattern_language.cpp and evaluator.cpp ends up using corrupted or invalid values. pattern_language.cpp uses getCallStack() and evaluator.cpp uses getUserData() and both fail to obtain a valid location. Unless both are fixed the error still occurs. In both cases I found that using the ast it is possible to obtain the error location. In evaluator.cpp we assume that the error occurred in the last node created. Then the following code work well there.
//auto node = e.getUserData(); <======================================= DELETE this line
//const auto location = node == nullptr ? Location::Empty() : node->getLocation(); <==== DELETE this line
const auto location = ast.back()->getLocation(); // <=========================== ADD this line instead
this->getConsole().setHardError(err::PatternLanguageError(e.format(location),
location.line, location.column));
On pattern_language.cpp we have the error in m_currError so use the following code to find the right node.
auto line = m_currError->line;
auto column = m_currError->column;
console.log(core::LogConsole::Level::Error, "[ Stack Trace ]");
u32 lastLine = 0;
for (const auto &node : m_currAST) {
if (node->getLocation().line == line && node->getLocation().column == column) {
auto location = node->getLocation();
if (lastLine == location.line)
continue;
console.log(core::LogConsole::Level::Error, core::err::impl::formatLocation(location));
console.log(core::LogConsole::Level::Error, core::err::impl::formatLines(location));
lastLine = location.line;
}
}
Using both changes and running one of the examples avoids crashing and generates valid errors as shown next
I ran the original code that started this issue and it didn't crash as seen next
@paxcut, why don't you open a PR with the proposed code changes?
Based on other discussions on discord it seems that GetUserData
should work in this case and the fact that it doesn't points to an even deeper problem that I am unable to track or attempt to fix. It is also possible that even tough the crashes are avoided using the last node of ast may not give the same results as using GatUserData
in which case the proposed changes would introduces more bugs. I normally would post PRs if I feel confident about knowing what caused the error and that the changes will fix it without introducing further errors which was not the case here so I left the code as a guide to where to start looking instead.