vstruct icon indicating copy to clipboard operation
vstruct copied to clipboard

[feature] Backreferences in format strings

Open ToxicFrog opened this issue 12 years ago • 4 comments

It should be possible to refer to already-read values in an easy way, rather than needing to make multiple calls to unpack. For example, length prefixed strings, which are currently:

local length = unpack("u4", true)
local str = unpack("s" .. length, true)

Could instead be:

local str = unpack("length:u4 s$length", true)

Current idea for backref syntax is that $name is a backref and can be used anywhere a number is allowed, e.g. "$count * { x:u4 y:u4 z:u4 }" is also legal. "name" can be a number ($1 being the first non-named value read), a simple name ($length), or a dot-separated identifier ($toc.count).

ToxicFrog avatar Mar 06 '13 20:03 ToxicFrog

Unanswered questions:

How do we handle backrefs in tables? Does the $1 in "u4 { u4 s$1 }" refer to the first u4 (scope is the entire read session) or the second (scope is the containing table)? Whichever it is, how do you refer to the other one?

How do we throw away values that were once useful but don't need to be returned? E.g. if you have a prefixed string like "u4 s$1", how do you express that you only want the second value? You can do this in code:

local _,str = unpack("u4 s$1", true)
local str = unpack("u4 s$1")[2]

Or by assigning names and then returning values rather than table:

local str = unpack("length:u4 s$length", true)

But the former is ugly and the latter misleading. Is there a way to express in the format string itself which values we are, or aren't, interested in?

ToxicFrog avatar Mar 06 '13 20:03 ToxicFrog

Does the $1 in "u4 { u4 s$1 }" refer to the first u4 (scope is the entire read session) or the second (scope is the containing table)? Second.

Whichever it is, how do you refer to the other one? By name: "first:u4 { u4 s$1 }"

how do you express that you only want the second value? Additional flag: "u4:skip s$1" ?

DangerPie avatar Nov 03 '13 17:11 DangerPie

I'm guessing that by the second one you meant

first:u4 { u4 s$first }

? If so, yeah, that's probably the way to do it - numeric backreferences are local, all other backreferences are global.

:skip is really ugly and I don't like it, though. I still haven't come up with a good way to mark don't-care values.

ToxicFrog avatar Nov 05 '13 01:11 ToxicFrog

Thinking on it more, I really like the idea of a leading . to indicate temp values:

local str = unpack(".length:u4 s$.length")

It's concise, easy to parse, familiar to anyone who's worked with hidden files, and also familiar to the way it's already used for table indexing - foo.bar stores bar in table foo, .bar stores bar in the "non-table".

It also occurs to me that this violates the principle of symmetry, i.e. pack(unpack(...)) and unpack(pack(...)) are no longer eqv to identity unless the packing code can be made much smarter.

ToxicFrog avatar Nov 05 '13 17:11 ToxicFrog