open-register-design-tool icon indicating copy to clipboard operation
open-register-design-tool copied to clipboard

Is there an option to enable or disable the flopping of inputs and/or outputs to the RTL generated?

Open neenuprince opened this issue 6 years ago • 1 comments

We have noticed that the latency is higher for ORDT and , notice that the inputs and outputs are flopped. Is there a way we can configure, not to flop

The Ordt generated RTL does not start the read or write until one clk cycle after the nsg_axirdl_mdb_shm starts the read or write because the inputs to the ordtreg_regsregs_jrdl_decode.sv file are flopped:

//------- reg assigns for pio i/f always_ff @ (posedge clk or negedge sig_ordtreg_rst_n) begin if (! sig_ordtreg_rst_n) begin pio_write_active <= #1 1'b0; pio_read_active <= #1 1'b0; end else begin pio_write_active <= #1 pio_write_active ? pio_no_acks : pio_activate_write; pio_read_active <= #1 pio_read_active ? pio_no_acks : pio_activate_read; pio_dec_address_d1 <= #1 pio_dec_address; pio_dec_write_data_d1 <= #1 pio_dec_write_data; end end The Ordt generated RTL does not return access_complete until one clk cycle after starting the read or write because the outputs for the ack/nack signals are also flopped:

//------- reg assigns for pio ack/nack always_ff @ (posedge clk or negedge sig_ordtreg_rst_n) begin if (! sig_ordtreg_rst_n) begin dec_pio_ack <= #1 1'b0; dec_pio_nack <= #1 1'b0; pio_external_ack <= #1 1'b0; pio_external_nack <= #1 1'b0; end else begin dec_pio_ack <= #1 dec_pio_ack ? 1'b0 : dec_pio_ack_next; dec_pio_nack <= #1 dec_pio_nack ? 1'b0 : dec_pio_nack_next; pio_external_ack <= #1 pio_external_ack_next; pio_external_nack <= #1 pio_external_nack_next; end end

neenuprince avatar Oct 18 '19 15:10 neenuprince

There is currently no switch to disable flopping of primary decoder inputs/outputs - min transaction delay is 2 cycles. Note that this is a minimum when all registers are internal to the root decode/logic module - if cascaded decoders, external registers, or non-parallel processor interface is used, latency will be greater. For the simple parallel decode case, do think it would be possible to shave a cycle from latency at the expense of some additional timing risk and logic complexity in back-to-back cases.

sdnellen avatar Oct 20 '19 17:10 sdnellen