3D_NeuroSim_V1.0
3D_NeuroSim_V1.0 copied to clipboard
different result
I try to get the resullt of the 2D 7nm SRAM. I use 8-bit VGG-8 network on CIFAR-10 dataset. The VGG-8 network model is from DNN_NeuroSim_V1.4.
I set memcelltype = 1, novelMapping = true, SARADC = true, validated = false, synchronous = false, pipeline = false, M3D = false, technode = 7, featuresize = 18e-9, wireWidth = 1, levelOutput = 16, cellBit = 1, heightInFeatureSizeSRAM = 16, widthInFeatureSizeSRAM = 34.43, widthSRAMCellNMOS = 1, numColMuxed = 8
But I get the readDynamicEnergy is: 9.62642e+07pJ. It is different with the result in 'Benchmarking Monolithic 3D Integration for Compute-in-Memory Accelerators: Overcoming ADC Bottlenecks and Maintaining Scalability to 7nm or Beyond ' which is: Area: 8.36mm^2, TOPS/W: 30.30, TOPS: 1.95, Power Density: 7.72e-03 W/mm^2, latency: 600us, dynamic energy: 35uJ
Do you have any suggestions to help me get the results similar to those in the paper?
My result is here.
------------------------------ Summary --------------------------------
ChipArea : 9.46458e+06um^2
Chip total CIM array : 3.52389e+06um^2
Total IC Area on chip (Global and Tile/PE local): 931046um^2
Total ADC (or S/As and precharger for SRAM) Area on chip : 2.04312e+06um^2
Total Accumulation Circuits (subarray level: adders, shiftAdds; PE/Tile/Global level: accumulation units) on chip : 1.80574e+06um^2
Other Peripheries (e.g. decoders, mux, switchmatrix, buffers, pooling and activation units) : 1.16078e+06um^2
Chip layer-by-layer readLatency (per image) is: 603729ns
Chip total readDynamicEnergy is: 9.62642e+07pJ
Chip total leakage Energy is: 6.02362e+06pJ
Chip total leakage Power is: 7531.8uW
Chip buffer readLatency is: 314434ns
Chip buffer readDynamicEnergy is: 236904pJ
Chip ic readLatency is: 65154.7ns
Chip ic readDynamicEnergy is: 3.45468e+06pJ
************************ Breakdown of Latency and Dynamic Energy *************************
----------- ADC (or S/As and precharger for SRAM) readLatency is : 173409ns
----------- Accumulation Circuits (subarray level: adders, shiftAdds; PE/Tile/Global level: accumulation units) readLatency is : 10241.2ns
----------- Other Peripheries (e.g. decoders, mux, switchmatrix, buffers, IC, pooling and activation units) readLatency is : 420079ns
----------- ADC (or S/As and precharger for SRAM) readDynamicEnergy is : 8.11379e+07pJ
----------- Accumulation Circuits (subarray level: adders, shiftAdds; PE/Tile/Global level: accumulation units) readDynamicEnergy is : 8.23443e+06pJ
----------- Other Peripheries (e.g. decoders, mux, switchmatrix, buffers, IC, pooling and activation units) readDynamicEnergy is : 6.8919e+06pJ
************************ Breakdown of Latency and Dynamic Energy *************************
----------------------------- Performance -------------------------------
Chip Operation Temperature (K): 313
Energy Efficiency TOPS/W (Layer-by-Layer Process): 12.0428
Throughput TOPS (Layer-by-Layer Process): 2.04038
Throughput FPS (Layer-by-Layer Process): 1656.37
Compute efficiency TOPS/mm^2 (Layer-by-Layer Process): 0.21558
Power Density W/mm^2 (Layer-by-Layer Process): 0.0179011
-------------------------------------- Hardware Performance Done --------------------------------------
My 'Param.cpp' is here.
Param::Param() {
/***************************************** user defined design options and parameters *****************************************/
operationmode = 2; // 1: conventionalSequential (Use several multi-bit RRAM as one synapse)
// 2: conventionalParallel (Use several multi-bit RRAM as one synapse)
memcelltype = 1; // 1: cell.memCellType = Type::SRAM
// 2: cell.memCellType = Type::RRAM
// 3: cell.memCellType = Type::FeFET
accesstype = 1; // 1: cell.accessType = CMOS_access
// 2: cell.accessType = BJT_access
// 3: cell.accessType = diode_access
// 4: cell.accessType = none_access (Crossbar Array)
transistortype = 1; // 1: inputParameter.transistorType = conventional
deviceroadmap = 2; // 1: inputParameter.deviceRoadmap = HP
// 2: inputParameter.deviceRoadmap = LSTP
globalBufferType = false; // false: register file
// true: SRAM
globalBufferCoreSizeRow = 128;
globalBufferCoreSizeCol = 128;
tileBufferType = false; // false: register file
// true: SRAM
tileBufferCoreSizeRow = 32;
tileBufferCoreSizeCol = 32;
peBufferType = false; // false: register file
// true: SRAM
chipActivation = true; // false: activation (reLu/sigmoid) inside Tile
// true: activation outside Tile
reLu = true; // false: sigmoid
// true: reLu
novelMapping = true; // false: conventional mapping
// true: novel mapping
SARADC = true; // false: MLSA
// true: sar ADC
currentMode = true; // false: MLSA use VSA
// true: MLSA use CSA
pipeline = false; // false: layer-by-layer process --> huge leakage energy in HP
// true: pipeline process
speedUpDegree = 8; // 1 = no speed up --> original speed
// 2 and more : speed up ratio, the higher, the faster
// A speed-up degree upper bound: when there is no idle period during each layer --> no need to further fold the system clock
// This idle period is defined by IFM sizes and data flow, the actual process latency of each layer may be different due to extra peripheries
validated = false; // false: no calibration factors
// true: validated by silicon data (wiring area in layout, gate switching activity, post-layout performance drop...)
synchronous = false; // false: asynchronous
// true: synchronous, clkFreq will be decided by sensing delay
M3D = false; // false: run 2D simulation
// true: run M3D simulation
/*** algorithm weight range, the default wrapper (based on WAGE) has fixed weight range of (-1, 1) ***/
algoWeightMax = 1;
algoWeightMin = -1;
/*** conventional hardware design options ***/
clkFreq = 1e9; // Clock frequency
temp = 300; // Temperature (K)
// technode: 130 --> wireWidth: 175
// technode: 90 --> wireWidth: 110
// technode: 65 --> wireWidth: 105
// technode: 45 --> wireWidth: 80
// technode: 32 --> wireWidth: 56
// technode: 22 --> wireWidth: 40
// technode: 14 --> wireWidth: 25
// technode: 10, 7 --> wireWidth: 18
technode = 7; // Technology
featuresize = 18e-9; // Wire width for subArray simulation
wireWidth = 18; // wireWidth of the cell for Accuracy calculation
globalBusDelayTolerance = 0.1; // to relax bus delay for global H-Tree (chip level: communication among tiles), if tolerance is 0.1, the latency will be relax to (1+0.1)*optimalLatency (trade-off with energy)
localBusDelayTolerance = 0.1; // to relax bus delay for global H-Tree (tile level: communication among PEs), if tolerance is 0.1, the latency will be relax to (1+0.1)*optimalLatency (trade-off with energy)
treeFoldedRatio = 4; // the H-Tree is assumed to be able to folding in layout (save area)
maxGlobalBusWidth = 2048; // the max buswidth allowed on chip level (just a upper_bound, the actual bus width is defined according to the auto floorplan)
// NOTE: Carefully choose this number!!!
// e.g. when use pipeline with high speedUpDegree, i.e. high throughput, need to increase the global bus width (interface of global buffer) --> guarantee global buffer speed
numRowSubArray = 128; // # of rows in single subArray
numColSubArray = 128; // # of columns in single subArray
/*** option to relax subArray layout ***/
relaxArrayCellHeight = 0; // relax ArrayCellHeight or not
relaxArrayCellWidth = 0; // relax ArrayCellWidth or not
numColMuxed = 8; // How many columns share 1 ADC (for eNVM and FeFET) or parallel SRAM
levelOutput = 16; // # of levels of the multilevelSenseAmp output, should be in 2^N forms; e.g. 32 levels --> 5-bit ADC
cellBit = 1; // precision of memory device
/*** parameters for SRAM ***/
// due the scaling, suggested SRAM cell size above 22nm: 160F^2
// SRAM cell size at 14nm: 300F^2
// SRAM cell size at 10nm: 400F^2
// SRAM cell size at 7nm: 600F^2
heightInFeatureSizeSRAM = 16; // SRAM Cell height in feature size
widthInFeatureSizeSRAM = 34.43; // SRAM Cell width in feature size
widthSRAMCellNMOS = 1;
widthSRAMCellPMOS = 1;
widthAccessCMOS = 1;
minSenseVoltage = 0.1;
/*** parameters for analog synaptic devices ***/
heightInFeatureSize1T1R = 4; // 1T1R Cell height in feature size
widthInFeatureSize1T1R = 12; // 1T1R Cell width in feature size
heightInFeatureSizeCrossbar = 2; // Crossbar Cell height in feature size
widthInFeatureSizeCrossbar = 2; // Crossbar Cell width in feature size
resistanceOn = 6e3; // Ron resistance at Vr in the reported measurement data (need to recalculate below if considering the nonlinearity)
resistanceOff = 6e3*150; // Roff resistance at Vr in the reported measurement dat (need to recalculate below if considering the nonlinearity)
maxConductance = (double) 1/resistanceOn;
minConductance = (double) 1/resistanceOff;
readVoltage = 0.5; // On-chip read voltage for memory cell
readPulseWidth = 10e-9; // read pulse width in sec
accessVoltage = 1.1; // Gate voltage for the transistor in 1T1R
resistanceAccess = resistanceOn*IR_DROP_TOLERANCE; // resistance of access CMOS in 1T1R
writeVoltage = 2; // Enable level shifer if writeVoltage > 1.5V
/*** Calibration parameters ***/
if(validated){
alpha = 1.44; // wiring area of level shifter
beta = 1.4; // latency factor of sensing cycle
gamma = 0.5; // switching activity of DFF in shifter-add and accumulator
delta = 0.15; // switching activity of adder
epsilon = 0.05; // switching activity of control circuits
zeta = 1.22; // post-layout energy increase
}
/***************************************** user defined design options and parameters *****************************************/
/***************************************** Initialization of parameters NO need to modify *****************************************/
if (memcelltype == 1) {
cellBit = 1; // force cellBit = 1 for all SRAM cases
}
/*** initialize operationMode as default ***/
conventionalParallel = 0;
conventionalSequential = 0;
BNNparallelMode = 0;
BNNsequentialMode = 0;
XNORsequentialMode = 0;
XNORparallelMode = 0;
switch(operationmode) {
case 6: XNORparallelMode = 1; break;
case 5: XNORsequentialMode = 1; break;
case 4: BNNparallelMode = 1; break;
case 3: BNNsequentialMode = 1; break;
case 2: conventionalParallel = 1; break;
case 1: conventionalSequential = 1; break;
default: printf("operationmode ERROR\n"); exit(-1);
}
/*** parallel read ***/
parallelRead = 0;
if(conventionalParallel || BNNparallelMode || XNORparallelMode) {
parallelRead = 1;
} else {
parallelRead = 0;
}
/*** Initialize interconnect wires ***/
switch(wireWidth) {
case 175: AR = 1.60; Rho = 2.20e-8; break; // for technode: 130
case 110: AR = 1.60; Rho = 2.52e-8; break; // for technode: 90
case 105: AR = 1.70; Rho = 2.68e-8; break; // for technode: 65
case 80: AR = 1.70; Rho = 3.31e-8; break; // for technode: 45
case 56: AR = 1.80; Rho = 3.70e-8; break; // for technode: 32
case 40: AR = 1.90; Rho = 4.03e-8; break; // for technode: 22
case 25: AR = 2.00; Rho = 5.08e-8; break; // for technode: 14
case 18: AR = 2.00; Rho = 6.35e-8; break; // for technode: 7, 10
case -1: break; // Ignore wire resistance or user define
default: exit(-1); puts("Wire width out of range");
}
if (memcelltype == 1) {
wireLengthRow = wireWidth * 1e-9 * heightInFeatureSizeSRAM;
wireLengthCol = wireWidth * 1e-9 * widthInFeatureSizeSRAM;
} else {
if (accesstype == 1) {
wireLengthRow = wireWidth * 1e-9 * heightInFeatureSize1T1R;
wireLengthCol = wireWidth * 1e-9 * widthInFeatureSize1T1R;
} else {
wireLengthRow = wireWidth * 1e-9 * heightInFeatureSizeCrossbar;
wireLengthCol = wireWidth * 1e-9 * widthInFeatureSizeCrossbar;
}
}
Rho *= (1+0.00451*abs(temp-300));
if (wireWidth == -1) {
unitLengthWireResistance = 1.0; // Use a small number to prevent numerical error for NeuroSim
wireResistanceRow = 0;
wireResistanceCol = 0;
} else {
unitLengthWireResistance = Rho / ( wireWidth*1e-9 * wireWidth*1e-9 * AR );
wireResistanceRow = unitLengthWireResistance * wireLengthRow;
wireResistanceCol = unitLengthWireResistance * wireLengthCol;
}
/***************************************** Initialization of parameters NO need to modify *****************************************/
}