calyx
calyx copied to clipboard
Tracker for realistic queue benchmarking harness
Overview
At a high level, our Shared Testing Harness works by processing a workload of pushes and pops as quickly as possible.
Benefits of the current setup.
- There's a simple way to verify correctness: simply check the hardware output matches that of the oracle.
- Benchmarking our queues is straight forward:
- run synthesis and compute cycle counts
- compute
total_time = cycle_count * (1000/(7 - worst_slack))
to estimate the total time spent on our workload. Smaller total_time means roughly faster queue!
Drawbacks of the current setup.
- This is an unrealistic depicition of the way switches process packets.
- IRL, our queues can't look into the future and know the entire workload of
pushes andpops at the start. - IRL, there may come times where our queue does nothing (when packets are in flight and it's not yet time to call
pop). However, since our test harness tries to process allpushes andpops as fast as possible, our tests have no idle time!
- IRL, our queues can't look into the future and know the entire workload of
We remedy this by making a benchmarking harness that more closely models actual PCAPs. Broadly, we wish to do the following:
- Fix a specific clock period for our queues.
- Determine the rate at which we call
pop - For each
pushin our workload, keep track of an "arrival time" for the associated packet. - Actually
pusha packet only once its arrival time has passed.- the hardware can do this by counting cycles since we've fixed the clock period
Challenges with the new setup.
- Benchmarking our queues becomes trickier: there's no longer a single number (
total_time) we can use to compare designs. Instead, we might consider some subset of the following:- generate graphs similar to those produced by our simulator
- keep track of how often overflow/underflow occurs Perhaps we can qualitatively compare queues with the helps of these statistics.
- We can no longer use this setup to check the correctness of our hardware.
- the number of cycles spent to
pushandpopnow influences the order packets are popped
- the number of cycles spent to
Plan
- [x] Write script to parse PCAPs and generate a
.datafile. The data file should include the following memories:commands,values,ans_memas usualarrival_cycles, to keep track of the packet's arrival time for eachpushmac_addrs, to keep track of the packet's source for eachpush; we'll use this for flow inference
- [ ] Make a calyx component similar to
queue_call.pyto repeatedly invoke our queue. - [ ] Generate graphs for our queues in the style of Formal Abstractions and our simulator.
Hooray, looks great! And yes, just bake in some simple flow inference for now.