Suggest using iterators instead of indexing
Using iterators instead of indexing is another possible optimization. At the end of the speedup refactoring: https://rustwasm.github.io/docs/book/game-of-life/time-profiling.html
At this point, the next lowest hanging fruit for speeding up Universe::tick is removing the allocation and free.
Rust does extra bound checks when indexing arrays. I think in simple cases the compiler can optimize it away (e.g. simple loop with a global .len() check) but that's unlikely with the convolution-like operator needed for GoL. Iterators might also lead to better memory access patterns although it's hard to verify.
I played a bit around and got universe.tick() to run ~2x faster than the version without the modulo. Caveat: measured with criterion, not as wasm, only measured for a 1024x768 grid.
I moved this logic to Cell:
impl Cell {
fn evolution(self, live_neighbors: u8) -> Cell {
match (self, live_neighbors) {
// Rule 1: Any live cell with fewer than two live neighbours
// dies, as if caused by underpopulation.
(Cell::Alive, x) if x < 2 => Cell::Dead,
// Rule 2: Any live cell with two or three live neighbours
// lives on to the next generation.
(Cell::Alive, 2) | (Cell::Alive, 3) => Cell::Alive,
// Rule 3: Any live cell with more than three live
// neighbours dies, as if by overpopulation.
(Cell::Alive, x) if x > 3 => Cell::Dead,
// Rule 4: Any dead cell with exactly three live neighbours
// becomes a live cell, as if by reproduction.
(Cell::Dead, 3) => Cell::Alive,
// All other cells remain in the same state.
(otherwise, _) => otherwise,
}
}
}
and replaced tick by:
pub fn tick(&mut self) {
let mut next = self.cells.clone();
let width = self.width as usize;
let height = self.height as usize;
let next_rows = next.chunks_exact_mut(width);
let rows_before = self.cells.chunks_exact(width).cycle().skip(height - 1);
let rows = self.cells.chunks_exact(width);
let rows_after = self.cells.chunks_exact(width).cycle().skip(1);
for (next_row, row_b, row, row_a) in izip!(next_rows, rows_before, rows, rows_after) {
// manually count active nb for first cell in row
let first_count = row_b[0] as u8
+ row_b[1] as u8
+ row_b[row_b.len() - 1] as u8
+ row[1] as u8
+ row[row.len() - 1] as u8
+ row_a[0] as u8
+ row_a[1] as u8
+ row_a[row_a.len() - 1] as u8;
next_row[0] = row[0].evolution(first_count);
// 3 sliding windows (size=3)
let next_cells = next_row.iter_mut().skip(1);
let window_before = row_b.windows(3);
let window = row.windows(3);
let window_after = row_a.windows(3);
for (next_cell, w_b, w, w_a) in izip!(next_cells, window_before, window, window_after) {
let count = w_b.iter().map(|&v| v as u8).sum::<u8>()
+ w[0] as u8
+ w[2] as u8
+ w_a.iter().map(|&v| v as u8).sum::<u8>();
*next_cell = w[1].evolution(count);
}
// manually count active nb for last cell in row
let last_count = row_b[0] as u8
+ row_b[row_b.len() - 2] as u8
+ row_b[row_b.len() - 1] as u8
+ row[0] as u8
+ row[row.len() - 2] as u8
+ row_a[0] as u8
+ row_a[row_a.len() - 2] as u8
+ row_a[row_a.len() - 1] as u8;
next_row[next_row.len() - 1] = row[row.len() - 1].evolution(last_count);
}
self.cells = next;
}
Granted it's not the most readable and there is probably a nicer way but it might be good to remind that Rust array indexing is not zero-cost.
(it's using izip from the itertools crate but could nest calls to zip to avoid extra dependency)