capstone tms320c64x: use more descriptive types in public header

The new types:

Fix issues where the field was unsigned but was assigned negative values
Make it easier to determine the semantics of fields
Make it easier to maintain language bindings

Also flatten anonymous typed structs.

Size savings:

cs_tms320c64x: 284 -> 240
tms320c64x_op_mem: 28 -> 24

The following were unsigned ints but assigned negative values:

cs_tms320c64x.funit.crosspath
cs_tms320c64x.parallel

I still need help determining the intended semantics, particularly on:

[ ] cs_tms320c64x.funit_crosspath
- Given values -1, 0, 1, 2 but explicitly checked to have value 1 in test_tms320c64x.c
[ ] tms320c64x_op_mem.disp
- Should this be signed?
- Should this be a tag value & union since TMS320C64xInstPrinter.c:printMemOperand() switches on mode?
[ ] cs_tms320c64x.parallel
- treated as a bool in test_tms320c64x.c/cstool_tms320c64x.c

Mar 17 '19 20:03 tmfink

this will break all bindings.

@fotisl please ack.

Mar 18 '19 03:03 aquynh

this will break all bindings.

Yes. I didn't clarify, but this should not be merged to any other branch. Since the API needed to be updated, I figured I'd go all the way.

Mar 18 '19 04:03 tmfink

I don't agree with some of the changes. For example, all information on the functional unit where the instruction is run and if it uses the crosspath, or the information on whether the instruction is conditional or not, should be in one struct. I don't think that it actually makes it easier to determine the semantics of the fields, but in contrast it is more difficult. Furthermore, are language bindings easier to maintain because some structs where removed? I don't think that this applies either. Maybe just a type review would be better. In general, I don't think that making those changes is worth breaking the API.

Mar 19 '19 10:03 fotisl

@fotisl thanks for the comments, my responses are inline:

I don't agree with some of the changes. For example, all information on the functional unit where the instruction is run and if it uses the crosspath, or the information on whether the instruction is conditional or not, should be in one struct.

How does leaving the funit information in one struct make things more clear? This adds an unnecessary struct that we need to handle in the bindings. Also, the C compilers potentially need to add extra alignment bytes, which increases size.

I don't think that it actually makes it easier to determine the semantics of the fields, but in contrast it is more difficult. Furthermore, are language bindings easier to maintain because some structs where removed? I don't think that this applies either.

I personally find needing to describe fewer types is simpler/less error-prone.

Also, the anonymous structs declared in the header, have no name, which means

C code cannot instantiate types like cs_tms320c64x.condition or cs_tms320c64x.funit.
Binding authors need to "invent" names for these types, which means the bindings types do not map directly to the C header.

Python example: https://github.com/aquynh/capstone/blob/10f5ecc92f9d23bd4372f9dce413456dadd9a082/bindings/python/capstone/tms320c64x.py#L25-L29

Rust example: https://github.com/capstone-rust/capstone-rs/blob/5044ace8022b1651d1b7c93d41ad56993e0225f5/capstone-sys/pre_generated/capstone.rs#L9333-L9348

Maybe just a type review would be better. In general, I don't think that making those changes is worth breaking the API.

I agree that "flattening" the structs is less important than assigning more descriptive types to the existing unsigned int types.

If there is consensus, then I'll revert the "flattening" changes and just keep the new types for the existing fields. However, I still need someone more familiar with the code/tms320c64x to determine if the types I used are correct (see initial comment in PR).

Mar 24 '19 17:03 tmfink

@fotisl thanks for the comments, my responses are inline:

I don't agree with some of the changes. For example, all information on the functional unit where the instruction is run and if it uses the crosspath, or the information on whether the instruction is conditional or not, should be in one struct.

How does leaving the funit information in one struct make things more clear? This adds an unnecessary struct that we need to handle in the bindings. Also, the C compilers potentially need to add extra alignment bytes, which increases size.

Please check the MCInst struct. All operands are different MCOperand structs in this struct. I consider it the same thing, keep all functional unit information together.

I don't think that it actually makes it easier to determine the semantics of the fields, but in contrast it is more difficult. Furthermore, are language bindings easier to maintain because some structs where removed? I don't think that this applies either.

I personally find needing to describe fewer types is simpler/less error-prone.

Well, note my example above :) Some of the most integral structs of capstone already hold information like this together and not split it in a flat structure.

Also, the anonymous structs declared in the header, have no name, which means

C code cannot instantiate types like cs_tms320c64x.condition or cs_tms320c64x.funit.

Where do you need to instantiate these types?

Binding authors need to "invent" names for these types, which means the bindings types do not map directly to the C header.

Python example: capstone/bindings/python/capstone/tms320c64x.py

Lines 25 to 29 in 10f5ecc

class TMS320C64xCondition(ctypes.Structure): fields = ( ('reg', ctypes.c_uint), ('zero', ctypes.c_uint), ) Rust example: https://github.com/capstone-rust/capstone-rs/blob/5044ace8022b1651d1b7c93d41ad56993e0225f5/capstone-sys/pre_generated/capstone.rs#L9333-L9348

I don't get these arguments. For example, cs_mips has cs_mips_op structures in it, which in turn have mips_op_type, mips_reg and mips_op_mem structures. Same applies to many other structures. Why does this specific structure has to be flattened?

Maybe just a type review would be better. In general, I don't think that making those changes is worth breaking the API.

I agree that "flattening" the structs is less important than assigning more descriptive types to the existing unsigned int types.

If there is consensus, then I'll revert the "flattening" changes and just keep the new types for the existing fields. However, I still need someone more familiar with the code/tms320c64x to determine if the types I used are correct (see initial comment in PR).

Can you please make a table with all the proposed changes so that I can review it? Old type -> new type -> reason (e.g. possible values in the code)

Mar 26 '19 11:03 fotisl

Can you please make a table with all the proposed changes so that I can review it? Old type -> new type -> reason (e.g. possible values in the code)

Sure thing (a list was easier).

`struct tms320c64x_op_mem`

all types were unsigned int
tms320c64x_reg base
- printed as register in test_tms320c64x.c

uint16_t disp

Perhaps this should be another union type:

// Depends on value of tms320c64x_op_mem.disptype field
union tms320c64x_mem_disp {
    uint16_t constant,
    tms320c64x_reg reg,
}

// Rename enum tms320c64x_mem_disp to tms320c64x_mem_disp_type

Printed as unsigned or as register in test_tms320c64x.c depending on disptype neighbor field

uint8_t unit
- Assigned values like TMS320C64X_FUNIT_D in TMS320C64xInstPrinter.c
bool scaled
- TMS320C64xDisassembler.c: only assigns 0 or 1 (because of & 1)
- TMS320C64xInstPrinter.c: checks if (scaled)
tms320c64x_mem_disp disptype
- TMS320C64xInstPrinter.c assigns values TMS320C64X_MEM_DISP_*
tms320c64x_mem_dir direction
- TMS320C64xInstPrinter.c assigns values TMS320C64X_MEM_DIR_*
tms320c64x_mem_mod modify
- TMS320C64xInstPrinter.c assigns values TMS320C64X_MEM_MOD_*

`struct cs_tms320c64x_op`

all types were unsigned int
tms320c64x_reg reg
- Register type
- Printed as register in TMS320C64xInstPrinter.c

`struct cs_tms320c64x`

all types were unsigned int
tms320c64x_reg condition.reg
- Register type
- Printed as register in TMS320C64xInstPrinter.c
bool condition.zero
- TMS320C64xDisassembler.c: only assigned zero or one
- Tested for whether it is equal to 1 with TMS320C64xInstPrinter.c/test_tms320c64x.c
tms320c64x_funit funit.unit
- TMS320C64xInstPrinter.c assigns values TMS320C64X_FUNIT_*
uint8_t funit.side TMS320C64xInstPrinter.c only assigns values 0, 1, 2
int8_t funit.crosspath
- TMS320C64xDisassembler.c only assigns values -1, 0, 1, 2
- However, test_tms320c64x.c checks if tms320c64x->funit.crosspath == 1
  - Is this a bug in the test program?
int8_t funit.parallel
- test_tms320c64x.c checks if parallel is 1, printing "true" or "false"
- TMS320C64xDisassembler.c only assigns -1, 0, 1
- Is there a bug in test_tms320c64x.c? Should it check if != 0 instead?

Mar 30 '19 17:03 tmfink

I'll update the PR to only change the types as listed above.

It will be easier to "flatten" types later.

Mar 30 '19 17:03 tmfink

@fotisl I changed the PR to only change the types. Does that look okay to you? There are still the unresolved questions in the original comment. Thanks!

Mar 31 '19 02:03 tmfink

@fotisl Any update?

Apr 17 '19 02:04 tmfink

@aquynh if this looks good to you, could you merge?

May 02 '19 00:05 tmfink

@fotisl, can you ack?

May 03 '19 15:05 aquynh

Thank you for the PR! I closed it because it is out of date. With the new auto-sync update for v6 we made many changes to some main architectures and will do also to others. This also changed the requirements we have now for new PRs.

If you still want to merge the changes, please rebase your fix onto the newest next branch and open a new PR.

Mar 20 '24 09:03 Rot127