cv32e40p Some of the Xpulp instructions use opcodes that are either reserved opcodes or standard opcodes

Some of the Xpulp opcodes have been chosen such that upstreaming of the related tool changes will become impossible unless we change some opcodes. Ideally only 'custom opcodes' are used for Xpulp extensions.

First question to answer:

[ ] Are there enough custom opcodes to host all the Xpulp extensions that are now spread in the reserved and standard opcodes?

Craig Blackmore's original message:

As per the actions of the minutes above, here is a summary of opcode issues followed by a breakdown of opcodes used by each CORE-V extension.

The issues fall into 4 categories:

Use of standard opcode (e.g. OP, LOAD, BRANCH). Any CORE-V extension that uses these opcodes may conflict with a future RISC-V standard extension.
Use of reserved opcode. Any CORE-V extension that uses a reserved opcode may conflict with a future RISC-V standard extension.
Use of custom- opcode. The RISC-V Foundation does not allow any standard extension to use the custom opcodes. Therefore, any CORE-V extension that only uses custom opcodes is guaranteed not to conflict with future standard extensions. However, any custom opcodes must be changed before a CORE-V extension can be accepted as a standard extension.
Incorrect use of standard opcode. The bitmanip immediate instructions (p.extract, p.extractu, p.insert, p.bclr, p.bset) use OP, but they should use OP-IMM (if we choose to continue using standard opcodes).

Any CORE-V extension that uses standard or reserved opcodes risks conflicting with future standard extensions. Using custom opcodes eliminates that risk, but those opcodes must be changed if the extension is to be standardized. I suspect we would also need a compelling reason to use a reserved opcode rather than fitting into the existing ones. Crucially, whatever encodings we choose now are subject to change during standardization.

Here is a summary of opcodes used by each CORE-V extension:

PULP bitmanip All instructions use OP. Immediate instructions (p.extract, p.extractu, p.insert, p.bclr, p.bset) should use OP-IMM.
PULP SIMD - reserved opcode.
Post-increment load/store - custom-0 opcode.
PULP general ALU - Add-shift instructions (p.add[u][R]N[r] and p.sub[u][R]N[r]) use custom-3 opcode. Multiply-accumulate instructions use custom-2 opcode. Remaining instructions use OP opcode.
PULP immediate branching - BRANCH opcode.
PULP HW loop - custom-3 opcode.
Xpulpcluster - LOAD opcode.

Aug 19 '20 11:08 Silabs-ArjanB

I haven't looked too close at how the branch is encoded but if you exclude it you might be able to squeeze it. It might be a tight fit and may make some encoding choices a bit odd due to tightly packing the space. If you add the branch back I'm not sure. There are a few instructions that are grouped together where some have an immediate and others have this immediate tied to 0. This is wasteful and they can be moved to a different encoding type but will no longer be logically grouped together. We may need to try remapping them and proposing something to see if some instructions need to be cut.

Spreadsheet I used for my rough estimation: pulp_encoding_blocks.xlsx

Aug 19 '20 22:08 jm4rtin

Thanks a lot @Silabs-ArjanB and @jm4rtin ! Very useful thread. In my understanding, we should try to see whether we can squeeze all the RESERVED and STANDARD instructions into the custom space.

It's a big task :) However, if there is not enough space, we can think of getting rid of some "less-useful" instructions, and/or reducing the immediate where possible. I am not saying it makes sense, but for sure we know that at this stage they won't get accepted to the standard GCC.

As this is a long term task (I don't think we will remap all of them over the next month), we need to be conscius that products and prototypes being developed now must rely on the PULP GCC, that is not maintained by OpenHW. In addition, as we will modify the CV32E40P spec, this also means that OpenHW won't maintain the current ISA of the CV32E40P (with the current encoding etc)

Just a note

Aug 20 '20 09:08 davideschiavone

It should be possible to remap them all. Here is my first attempt at remapping them: pulp_encoding_blocks.xlsx

Some notes:

Decoding would start with the block (custom-0 ... custom-3)
Next would be the funct3 field
After that it depends on the block/plane:
- For custom-0 it maps to an instruction
- For custom-1 it either maps to an instruction or goes to Plane A or Plane B
- For custom-2 it looks at f3 to determine the instruction
- For custom-3 it uses a combination of funct5 and F to determine the instruction (this is the SIMD instructions moved directly to custom-3)
- For Plane A it looks at funct7 to determine the instruction
- For Plane B it looks at funct4 to determine the instruction (this plane is only used by the hardware loop right now)
If you exclude SIMD a vendor could re-purpose custom-3 for their own use
A grouping of instructions (e.g. bit manipulation) may no longer reside in a single block
A couple encodings for instructions have moved the bits around slightly
This may add a bit of complexity (read: longer combinatorial path) to a few instructions
Some of the instructions may need to be moved around in the encoding space a bit to simplify the mapping of bits in the instruction to control signals for the data path

Thoughts?

Aug 20 '20 21:08 jm4rtin

Hi @jm4rtin Thank you for this great proposal. Of course it is difficult to foresee if timing issues would arise from this; it would probably be easiest to just try it out to figure that out. I like that SIMD is only in custom-3 and that no other instructions are in that group. It would be nice if such a property was true for bit manipulation as well (as we know that a standard for that will be ratified), but I do not see that as a requirement. You could even argue that we should not focus on SIMD and bit manipulation too much and use those instructions to 'fill the holes' and that the focus should be on having a clean encoding for all the other instructions.

Aug 21 '20 06:08 Silabs-ArjanB

@jm4rtin

Nice work. One thing we don't have to worry about is SIMD and bitmanip, since we know these are going to be replaced with the official RISC-V versions in the future for CORE-V. That frees up all custom3 and some more of the space. Does this make the job easier?

I've taken your spreadsheet, adding some color coding for each of the instruction set extension groups. I've also looked how the instruction encodings map on to the standard R-/I-/S-/B-/U-/J-types defined in the standard. In a number of places we would need variants decoding, which may be a headache for the hardware engineers. Of perhaps we already have it for PULP anyway pulp_encoding_blocks-jpb.xlsx

I'll get @craigblackmore to look and comment.

Aug 21 '20 16:08 jeremybennett

Well, the cv32e40p has bitmanipulation and simd and will support them. So it's better to move all the extensions in the the custom space. With less priority sure, but still, maybe someone will extend the official GCC with these instructions too. Especially the SIMD onces, which are further than bitman to be standardized.

Aug 21 '20 16:08 davideschiavone

The encoding isn't really a headache for the hardware. Even before they got swapped around there was some other encodings for some of these instructions. It's more of a headache for the toolchain developer to create the custom relocation types.

One thing we don't have to worry about is SIMD and bitmanip, since we know these are going to be replaced with the official RISC-V versions in the future for CORE-V. That frees up all custom3 and some more of the space. Does this make the job easier?

In my opinion it doesn't really make it easier. As you can see from the color coding you provided the bit manipulation (yellow) is not off by itself. I wouldn't suggest spreading the other instructions out into the custom-3 block either. It may be better to move it to the end of the list so that when it is removed there isn't a large gap (Arjan's suggestion).

As for the removal of bit manipulation and SIMD, yeah that is planned for CV32E40. What isn't clear yet is what to do with CV32E40P. This instruction encoding topic appears to be on the docket for the TWG meeting at the end of the month.

Aug 21 '20 17:08 jm4rtin

Well, the cv32e40p has bitmanipulation and simd and will support them. So it's better to move all the extensions in the the custom space. With less priority sure, but still, maybe someone will extend the official GCC with these instructions too. Especially the SIMD onces, which are further than bitman to be standardized.

Hi @davideschiavone

The tool chain for CORE-V won't be supporting them, so from an upstreaming perspective, where they are encoded is not an issue for us. They can stay where they are or be moved, whichever is less painful for the hardware team.

Aug 21 '20 17:08 jeremybennett

The encoding isn't really a headache for the hardware. Even before they got swapped around there was some other encodings for some of these instructions. It's more of a headache for the toolchain developer to create the custom relocation types.

I'm not a hardware engineer, so I know just enough to know it is a problem, but without detailed insight.

One thing we don't have to worry about is SIMD and bitmanip, since we know these are going to be replaced with the official RISC-V versions in the future for CORE-V. That frees up all custom3 and some more of the space. Does this make the job easier?

In my opinion it doesn't really make it easier. As you can see from the color coding you provided the bit manipulation (yellow) is not off by itself. I wouldn't suggest spreading the other instructions out into the custom-3 block either. It may be better to move it to the end of the list so that when it is removed there isn't a large gap (Arjan's suggestion).

As for the removal of bit manipulation and SIMD, yeah that is planned for CV32E40. What isn't clear yet is what to do with CV32E40P. This instruction encoding topic appears to be on the docket for the TWG meeting at the end of the month.

OK - as I noted to @davideschiavone from the perspective of upstreaming, the only opcodes which concern us are those which are to be supported in the upstreamed compiler tool chain. Anything else is a matter of making things as easy as possible for the hardware team.

Aug 21 '20 17:08 jeremybennett

Well, the cv32e40p has bitmanipulation and simd and will support them. So it's better to move all the extensions in the the custom space. With less priority sure, but still, maybe someone will extend the official GCC with these instructions too. Especially the SIMD onces, which are further than bitman to be standardized.

Hi @davideschiavone

The tool chain for CORE-V won't be supporting them, so from an upstreaming perspective, where they are encoded is not an issue for us. They can stay where they are or be moved, whichever is less painful for the hardware team.

Hi @jeremybennett

Well they are not supported for now as long as I understood :) you never now if some contributor wants to make that effort, but in anycase, better to have them in the custom space so that the eventual contributor can invest in upstreaming them

Aug 21 '20 17:08 davideschiavone

The encoding isn't really a headache for the hardware. Even before they got swapped around there was some other encodings for some of these instructions. It's more of a headache for the toolchain developer to create the custom relocation types.

I'm not a hardware engineer, so I know just enough to know it is a problem, but without detailed insight.

One thing we don't have to worry about is SIMD and bitmanip, since we know these are going to be replaced with the official RISC-V versions in the future for CORE-V. That frees up all custom3 and some more of the space. Does this make the job easier?

In my opinion it doesn't really make it easier. As you can see from the color coding you provided the bit manipulation (yellow) is not off by itself. I wouldn't suggest spreading the other instructions out into the custom-3 block either. It may be better to move it to the end of the list so that when it is removed there isn't a large gap (Arjan's suggestion). As for the removal of bit manipulation and SIMD, yeah that is planned for CV32E40. What isn't clear yet is what to do with CV32E40P. This instruction encoding topic appears to be on the docket for the TWG meeting at the end of the month.

OK - as I noted to @davideschiavone from the perspective of upstreaming, the only opcodes which concern us are those which are to be supported in the upstreamed compiler tool chain. Anything else is a matter of making things as easy as possible for the hardware team.

Hi @jm4rtin ,

from "writing" the HW is not a problem, but if the encoding becomes very complicated, the HW complexity increases and the area and timing get worse. But this has less priority in my opinion than moving them to the custom space (but @Silabs-ArjanB can add more)

Aug 21 '20 17:08 davideschiavone

from "writing" the HW is not a problem, but if the encoding becomes very complicated, the HW complexity increases and the area and timing get worse. But this has less priority in my opinion than moving them to the custom space (but @Silabs-ArjanB can add more)

@davideschiavone Yes, of course. It should only be as complex as it needs to be and no more. The statement was a bit of a simplification. For example it would be a bad idea to move the destination register (rd) around randomly for different instructions. If you have some new field that is only used by the extension instructions it is more flexible and doesn't really cause a penalty. If there is more bits used to distinguish which instruction is which that also comes at a cost. There will be some impact on the hardware but yes it doesn't really make the work for a HW engineer much harder.

@jeremybennett If you have a proposal to reduce the number of encoding types / work needed for the toolchain, feel free to mention it.

Aug 21 '20 17:08 jm4rtin

@jm4rtin The number of encoding types/work for the tool chain is not a significant problem for the software tool chain. It's just rows in a table and potentially some extra relocations (which would almost certainly be needed irrespective of the encoding). All the hard work is above that level, and with limited resources available we've concentrated on those extensions which will require long-term support in the tools.

Aug 21 '20 17:08 jeremybennett

Thanks @jm4rtin this spreadsheet is really helpful.

Another thing to consider is custom-2 and custom-3 are custom for rv32/rv64 but reserved for rv128, so putting everything except SIMD/bitmanip in custom-0/custom-1 would make extending their support to cv128 easier. If SIMD and bitmanip do need to be moved, perhaps they could go in custom-2/custom-3.

Aug 21 '20 17:08 craigblackmore

@craigblackmore Most of custom-0 and custom-1 are in use by the post-increment load/store and the immediate branches. Some of the bit manipulation, general ALU and MAC instructions need quite a few bits for various registers/immediates and take up most of custom-2. I don't think it's possible to move those out of custom-2. I think an rv128 core would look very different than what CV32E40P is today.

Aug 21 '20 18:08 jm4rtin

Once this issue has been resolved the assertions from #457 can be checked again to see if they pass.

Sep 02 '20 13:09 Silabs-ArjanB

Recently, when I transplanted cv32e40p on the cluster side of Pulp, I found some problems. In addition to the interface mismatch, some internal logic is also different, including the fetching and decoding stages. In this regard, I would like to ask how to implement Pulp's cluster kernel to cv32e40p. Thank you.

Jan 21 '21 09:01 Suxiaojie-iot

Hi @Suxiaojie-iot, the items you mention are deliberate changes, not problems. Please refer to the cv32e40p user manual.

Also, in the future, please do not mix multiple topics in a single GitHub issue. This issue is for Xpulp instruction opcodes. To discuss integration of "Pulp's cluster kernel to cv32e40p" you should open an new issue.

Jan 21 '21 13:01 MikeOpenHWGroup

Letter received. Sorry for doing so. This is the first time, I will pay attention next time, thank you.

------------------ 原始邮件 ------------------ 发件人: "Mike Thompson"<[email protected]>; 发送时间: 2021年1月21日(星期四) 晚上9:42 收件人: "openhwgroup/cv32e40p"<[email protected]>; 抄送: "小杰"<[email protected]>; "Mention"<[email protected]>; 主题: Re: [openhwgroup/cv32e40p] Some of the Xpulp instructions use opcodes that are either reserved opcodes or standard opcodes (#452)

Hi @Suxiaojie-iot, the items you mention are deliberate changes, not problems. Please refer to the cv32e40p user manual.

Also, in the future, please do not mix multiple topics in a single GitHub issue. This issue is for Xpulp instruction opcodes. To discuss integration of "Pulp's cluster kernel to cv32e40p" you should open an new issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Jan 22 '21 01:01 Suxiaojie-iot

Hi,

Starting from the proposal made by @jm4rtin and amended by @jeremybennett in pulp_encoding_blocks-jpb.xlsx, here is a new PULP re-encoding proposal for CV32E40Pv2 pulp_encoding_blocks-OPHW-jm-jpb-pgo.xlsx.

It contains few corrections and modifications to improve instructions decoding. Original proposal are commented in sheets renamed JM, final proposal are not renamed and PGO renamed ones.

Feel free to review and comment using this thread.

Best regards, Pascal.

Feb 24 '22 11:02 pascalgouedo

@jm4rtin , @jeremybennett any comment on this?

Mar 08 '22 10:03 davideschiavone

Hi, Here is a new version pulp_encoding_blocks-OPHW-jm-jpb-pgo-2022-04-06.xlsx with X's removal and some bit manip instructions grouping in custom-3.

Apr 06 '22 15:04 pascalgouedo

Last update on pack instructions pulp_encoding_blocks-OPHW-jm-jpb-pgo-2022-04-26.xlsx

Apr 26 '22 16:04 pascalgouedo

It looks good to me. Thank you for taking the time to optimize the encodings. Will PR #700 include the RTL changes eventually or is it just for the documentation updates?

Apr 26 '22 17:04 jm4rtin

Thanks John. #700 will only be documentation updates. I will submit another PR with RTL changes.

Apr 27 '22 06:04 pascalgouedo

#700 replaced by #704.

Apr 28 '22 13:04 pascalgouedo

Final version of the document with removal of old JM encoding proposals pulp_encoding_blocks-OPHW-final-2022-06-22.xlsx

Jun 23 '22 13:06 pascalgouedo

Hi As always when starting to implement specification and verifying it, you find some problems you didn't anticipate. Here things are that some ALU control signals are not generated in decoder file but in id one and were not analysed/taken into account in previous versions of re-encoding tables (at least by me).

Here is a new pulp_encoding_blocks-OPHW-2022-09-26.xlsx version taking into account ALU control signals. First tab lists modifications with respect to older versions and each tab has a comment for each changed instructions.

Best regards, Pascal.

Sep 30 '22 11:09 pascalgouedo

Hi, The very last version pulp_encoding_blocks-OPHW-2022-10-07.xlsx with final encoding of the 2 new Hardware Loops instructions.

Nov 07 '22 08:11 pascalgouedo

Added missing cv.dotsp.sc.b in pulp_encoding_blocks-OPHW-2022-11-14.xlsx

Nov 14 '22 13:11 pascalgouedo