compiler icon indicating copy to clipboard operation
compiler copied to clipboard

Tag mismatch

Open Misiur opened this issue 6 years ago • 24 comments

#include <a_samp>

enum E_TEST {
	E_WTF[32]
}

main () {
	new test[E_TEST];
	(test[E_WTF])[0] = EOS; // Tag mismatch
	(_:test[E_WTF])[0] = EOS; // Still tag mismatch
	(test[E_WTF])[E_TEST:0] = EOS; // No tag mismatch
}

Is the compiler right here?

This is the exact place which throws the warning

Misiur avatar Sep 04 '17 07:09 Misiur

I can confirm this is happening on version v3.10.6 of the compiler.

I am not sure whether this is a bug, or just a wired kink in the PAWN language, but I would support changing it so that (test[E_WTF])[0] = EOS; is the correct one. It would make little sense to use a member of the enum there. What would something like (test[E_WTF])[W_WTF] ever achieve?

samphunter avatar Jan 06 '18 15:01 samphunter

I don't think the question whether the compiler is right or wrong can be answered because the PAWN Language Guide (as much as I could check) doesn't mention about enclosing arrays within parenthesis.

YashasSamaga avatar Jan 06 '18 15:01 YashasSamaga

I feel like this is correct, the enumerator declares a set of symbols that implicitly have a tag (if the enumerator is labelled). So E_TEST:0 feels like the correct one here, because E_WTF == E_TEST:0, therefore, test[E_TEST:0] is just as valid as test[E_WTF].

Even though the expression before indexing is enclosed in brackets, I wouldn't expect the tag rules to still apply. But I may have misunderstood the issue or info provided as I don't see a problem here!

Edit: yes, I completely missed the extra dimension! Ignore me...

Southclaws avatar Jan 06 '18 17:01 Southclaws

You missed one dimension. It's test[E_WTF][E_TEST:0]

Misiur avatar Jan 06 '18 17:01 Misiur

But from my experience, this is how enclosing arrays in parenthesis work:

Note that (arr[1])[1] is not the same as arr[1][1]. The parenthesis causes arr[1] to evaluate to a new array whose base address is the address of arr[1].

For the compiler, (test[E_WTF]) is NOT the string. It is still the enum array whose base address starts from test[E_WTF].

The correct way (by my definition of what () are supposed to do) to access the string is: (test[E_WTF])[E_WTF][0] = 0;

The language guide does not say anything. Hence, we can't really tell what is correct.

This is quite useful as it allows some neat pointer tricks. For example, for my array library (which behaves like std::array), I create an array to store the elements and some extra cells to store useful information. To allow regular array access, I do (arr[4])[index]. The first 4 cells store meta information about the array and the above syntax seamlessly allows the user to access the contents of the array as if no meta information was being stored.

Researching it:

For example,

new arr[100]; //note that we have just one dimension

(arr[10])[5] = 123; //is this two dimensions or one dimension? the assembly answers it.

arr[15] = 132;

would give something like

addr.pri fffffe48
add.c 28+c 
move.alt
const.pri 37
stor.i

addr.pri fffffe48
add.c 34
move.alt
const.pri 7b
stor.i

This can be extended to multiple levels:

((arr[4])[5])[6] = 0;

would give

addr.pri fffffe48
add.c 10+14+18

2D Arrays: arr[4] resolves into a 1D array and the rest would behave as if arr[4] was a 1D array.

new arr[100][40];
((arr[4])[6])[6] = 0;

3D arrays: arr[4] resolves into a 2D array. The next [6] operates on the 2D array and resolves to a 1D array. The rest behaves like a normal 1D array.

new arr[100][40][123];
(((arr[4])[6])[6])[5] = 0;

Conclusion: if the array is multi-dimensional, each subscript keeps reducing one dimension just like a normal array access but once it boils down to a single dimension, the magic starts happening.

Compile-time error checking: For multi-dimensional arrays, the subscript values are checked if it exceeds the dimension size until you have a 1D array.

For 1D array, each subscript is checked if it exceeds the minor dimension size but all subscripts put together can go OOB.

YashasSamaga avatar Jan 06 '18 17:01 YashasSamaga

I don't know if this is related, but I was just doing this:

stock GetOpcodeInstructionName(Opcode:opcode) {
	new
		ret[OPCODE_MAX_INSN_NAME] = "none";
	if (OP_NONE < opcode < Opcode:NUM_OPCODES) {
		ret = insn_table[_:opcode][OpcodeInsnInfo_name];
	}
	return ret;
}

Which is a fix for this function:

https://github.com/Zeex/amx_assembly/blob/c390cbebfa6e88425205ec21577ed09e34e7229b/opcode.inc#L845-L851

And I'm getting a strange "index tag mismatch". Same problem? (though without the warning I don't think I can do that anyway, which is a shame).

Y-Less avatar Jan 06 '18 18:01 Y-Less

@Y-Less I made the compiler spit out information on why it triggered the warning:

_ OpcodeInsnInfo 0 1073741827

It is expecting _ but found OpcodeInsnInfo .

Warning is being triggered by: https://github.com/Southclaws/pawn/blob/master/source/compiler/sc3.c#L1008

else if (lval3.ident != iARRAYCELL && !matchtag(lval3.sym->x.tags.index, idxtag, TRUE)) {
      constvalue *tagsym;
      char *formal_tagname,*actual_tagname;
      tagsym=find_tag_byval(lval3.sym->x.tags.index);
      formal_tagname=(tagsym!=NULL) ? tagsym->name : "-unknown-";
      tagsym=find_tag_byval(idxtag);
      actual_tagname=(tagsym!=NULL) ? tagsym->name : "-unknown-";
      printf("%s %s %d %d\n", formal_tagname, actual_tagname, lval3.sym->x.tags.index, idxtag);
      error(229, (lval2.sym != NULL) ? lval2.sym->name : lval3.sym->name); /* index tag mismatch */
    }

YashasSamaga avatar Jan 06 '18 18:01 YashasSamaga

@Y-Less

Asking the compiler to spit out more information (by printing lval3.sym->name and lval2.sym->name which respectively gave ret and insn_table; note idxtag=lval2.sym->x.tags.index;) revealed that the compiler is expecting the index tags of the two arrays, ret and insn_table[_:opcode][OpcodeInsnInfo_name];, to match.

Fix (doesn't make sense but gets rid of the warning): new ret[OpcodeInsnInfo:OPCODE_MAX_INSN_NAME] = "none";

I think #233 would more useful if it takes care of index tag mismatch warnings too.

YashasSamaga avatar Jan 06 '18 18:01 YashasSamaga

That makes no sense... But at least it is a fix, thanks!

Y-Less avatar Jan 06 '18 19:01 Y-Less

Even though not many people will fiddle with parentheses and arrays, I think the info from https://github.com/Southclaws/pawn/issues/182#issuecomment-355760647 should go into wiki as a compiler feature documentation (or a warning for people like me who expect different output).

Misiur avatar Jan 06 '18 20:01 Misiur

The fake arrays inside a enum behave like enclosing the array in parenthesis as mentioned in https://github.com/Southclaws/pawn/issues/237#issuecomment-355750193

It produces assembly similar to the enclosed arrays.

This is mostly

test[E_WTF][5] = 123;

assembly-wise equivalent to

(test[E_WTF])[5] = 123

(and test[E_TEST:0])[E_WTF][5] = 123; too but this will cause a cycle in my argument)

The distinct property of these arrays enclosed in parenthesis is that they produce two add.c instructions in the assembly while calculating the address: the first to compute the base address and the second to add the offset. Similar code is produced when the fake arrays (I don't know what to call these kind of arrays and at the same time I don't want to be wrong by saying they are just arrays) of enums are accessed.

The compiler optimizes (click to check an interesting comment about enums) the two consecutive add.c to give one which ends up showing as add.c const1+const2 in the assembly.

@Misiur whatever I wrote is guesswork from observations. I could be completely wrong with "why it behaves like it does" explanation (which I feel like after investigating enum arrays).

I have seen people using this: Slice's pointers include

YashasSamaga avatar Jan 07 '18 02:01 YashasSamaga

Well, I feel we are a bit off-topic here. The issue is about whether the additional offset, that is added to the first one should use the tag of the enum or _.

As was stated by @YashasSamaga, this may not be defined in the pawn language, but from practical point of view, I feel it should accept _.

When we write something like (test[E_WTF])[1] = EOS;, we want the 1 to be additional offset added to it, but it makes little sense to use a constant from the enum. The array we use it on is no longer the original array test, but a different array (different base address), so the enum members will not correspond to anything meaningful.

samphunter avatar Jan 07 '18 11:01 samphunter

This: (test[E_WTF])[1] is basically the same as test[E_WTF + 1] - you want an offset from the original base. Adding numbers to enums has never been allowed without tags, at least not strong tags. We shouldn't weaken strong tags in this case when there are already weak tags that can be used when you don't want to specify them.

Y-Less avatar Apr 08 '18 17:04 Y-Less

Easiest way out in case of macros (and not caring about losing one dimension) which created this situation in the first place: (test[E_TEST:0])[E_WTF][0]

Misiur avatar Apr 08 '18 18:04 Misiur

This essentially boils down to whether you interpret (test[E_WTF])[1] as test[E_WTF + 1] or whether you interpret it as *(&(test[E_WTF]) + 1) in C syntax. In the first case, it should stay as it is, in the second case, test[E_WTF] would evaluate to a value, so it would loose its tag. I don't think either way of looking at it is 100% correct, it just depends on what the PAWN team decides to go with.

The first way seems to be more intuitive for PAWN, as there are no pointers and stuff like that, the second way seems to be more intuitive for people used to working with C. The second way on the other hand is more practical, as I mentiond earlier. The tag remaining is never a desired behaviour, it will always just be an obstacle in this context.

samphunter avatar Apr 08 '18 21:04 samphunter

Whose to say the tag remaining is never desired? As I said - if you don't want a tag, don't use a strong tag. I think that's quite explicit as to the behaviour you want, and if you want different behaviour you can be explicit about that as well.

Y-Less avatar Apr 08 '18 23:04 Y-Less

I just checked, and using an int where a weak tag is expected is still a mismatch - it's only the other way around that's allowed.

Y-Less avatar Apr 08 '18 23:04 Y-Less

Whose to say the tag remaining is never desired?

Ok, fair. I can't be sure it is never desired. I just can't even begin to imagine where chainig offsets in such a way could be even remoetly usefull. What would you ever gain by somethin like test[E_WTF + E_WTF]? On the other hand, doing test[E_WTF+2] is demonstrably usefull in relatively common practical usecases. That is what I was tring to say.

samphunter avatar Apr 09 '18 06:04 samphunter

I thought I learned about the (x[a])[b] syntax very recently when reviewing some of the code for @Misiur (as in, shortly before the first post here recently). Turns out, I was wrong:

http://forum.sa-mp.com/showpost.php?p=1633287&postcount=12

I just forgot again - I wish I hadn't, there are some interesting uses for this.

Y-Less avatar Apr 10 '18 14:04 Y-Less

No, all enum constants are globally scoped.

Y-Less avatar Nov 23 '20 11:11 Y-Less

I thought that had been fixed in the latest version, but apparently not. Thanks.

Y-Less avatar Nov 23 '20 12:11 Y-Less

Could you explain little bit about why only unnamed tag definition appears error (i mean i guess there's shouldn't be any error, no? Coz each enum have own fields, and one of them is global, which doesn't redefine any of them due they are named)??

Named enumerations are a special case, they are allowed to have constants with same names; unnamed enums aren't.

Daniel-Cortez avatar Nov 23 '20 12:11 Daniel-Cortez

I hate squash and merge. Merge commits are a single commit that shows you were a feature was added, like a squash commit, but with the development history maintained in the merged branch. Just treat the merge commit as a squash commit and its basically the same. Maybe the merge commits themselves could have a better description though.

Y-Less avatar Nov 23 '20 23:11 Y-Less

They're not variables, no, they're constants (so much so that Pawn 4 renamed enum to const). This:

enum ENUM
{
    A,
    B,
}

Is equivalent to:

const ENUM:A = 0;
const ENUM:B = 1;

Thus this:

enum ENUM_1
{
    A,
    B,
}

enum ENUM_2
{
    A = 2,
    B,
}

Is equivalent to:

const ENUM_1:A = 0;
const ENUM_1:B = 1;

const ENUM_2:A = 2;
const ENUM_2:B = 3;

What's the tag and value of A? As it happens, this is handled specially but is currently the only example of tag-specific symbol lookup; there are a few other issues where we're discussing other instances of tag-specific lookups however. As for which is correct, here is another circumstance to consider:

enum a
{
	first_elem,
	second_elem
}

enum
{
	Tagged:first_elem = 2,
	second_elem
}

main()
{
	new Tagged:c[10];
	c[1] = first_elem;
	return 0;
}

That's an ambiguous symbol error if anonymous enums follow enum name resolution. The only way to resolve that would be with the _: tag::

enum a
{
	first_elem,
	second_elem
}

enum
{
	Tagged:first_elem = 2,
	second_elem
}

main()
{
	new Tagged:c[10];
	c[1] = _:first_elem;
	return 0;
}

Now you've got a tag warning, unless you add a second tag-override to it:

enum a
{
	first_elem = 0,
	second_elem = 1
}

enum b
{
	Tagged:first_elem = 0,
	second_elem = 1
}

main()
{
	new Tagged:c[10];
	c[1] = Tagged:a:first_elem;
	return 0;
}

But then you're likely to miss the bug added to the second one, How do you add a tag override to resolve the ambiguous constant when there is no tag to add? There was another recent PR (#521) which actually further separated the uses of named and anonymous enums, making the latter even more just syntactic sugar for declaring global constants, and I think they should stay separate (there are even more questions if the globals/enums scopes are entirely separated).

Y-Less avatar Nov 23 '20 23:11 Y-Less