activitysim
activitysim copied to clipboard
Prototyping / Research on using RUST for future versions
Shifting ActivitySim’s core functionality to Rust could address some of the scalability and maintainability challenges of Python while keeping the user-facing API familiar. Here’s why this could be a good idea, with specifics tied to ActivitySim’s needs:
1. Performance and Parallelism
- Zero-cost abstractions: Rust compiles to highly optimized machine code without the overhead of Python’s interpreter or NumPy’s dispatch layers. For simulation runs with millions of households, tours, and trips, this means faster execution out of the box.
- Safe concurrency: Rust’s ownership and borrowing system makes it much easier to safely use multithreading without race conditions. That’s particularly relevant for ActivitySim, which needs to apply similar choice models across many individuals, households, and zones—a “pleasingly parallel” workload.
- SIMD and GPU friendliness: Rust has strong support for low-level SIMD intrinsics and emerging GPU compute bindings (e.g.,
wgpu). Core choice model kernels could benefit directly.
2. Memory Efficiency
- Fine-grained control: Unlike Python’s object-heavy structures, Rust allows direct control over memory layout. For ActivitySim’s large arrays (households × persons × tours × trips), a well-structured Rust implementation could significantly reduce memory footprint.
- Avoids garbage collection pauses: Rust doesn’t have a garbage collector. This eliminates unpredictable pauses during long runs and helps scale simulations more smoothly.
- Data locality: Rust lets you design cache-friendly, columnar data layouts similar to what ActivitySim currently gets via pandas/NumPy, but with fewer layers of abstraction.
3. Reliability and Maintainability
- Compile-time guarantees: Rust’s strict type system ensures correctness for tricky model state transitions. For example, once you’ve allocated a person’s tour schedule, Rust prevents accidental reuse or mutation elsewhere unless you explicitly allow it.
- No nulls / safer handling of missing data: Rust’s
Option<T>forces you to explicitly handle missing or special-case values, reducing the risk of subtle bugs that creep into large modeling frameworks. - Error handling: The
Resulttype enforces handling of possible errors—critical for long simulations where silent failures could invalidate results.
4. Ecosystem and Interoperability
- Python bindings via PyO3 or maturin: You can keep the Python-facing API intact for users, while re-implementing core computational kernels in Rust. This preserves ActivitySim’s accessibility for planners while delivering speed and stability under the hood.
- Data interchange: Rust integrates well with Arrow, Parquet, and HDF5, which are already common in transport modeling workflows. That aligns with ActivitySim’s need for big data handling.
- Testing and reproducibility: Rust’s built-in testing framework encourages modular, tested code—a good fit for ActivitySim’s open-source collaborative model.
5. Future-Proofing
- Scaling to distributed computing: Rust’s safety guarantees extend naturally into distributed systems. If ActivitySim evolves toward cluster-scale or cloud-native execution, Rust provides a strong foundation.
- Longevity: Python APIs change quickly, and pandas/numpy internals are evolving toward Arrow. Rust’s stability promises a longer shelf life for the core engine, which matters given ActivitySim’s role as a community standard.
Downsides
Good question — Rust has a lot of strengths, but there are real trade-offs to consider if you’re thinking about building a Rust-based ActivitySim. Here are the key downsides that would matter most for your context:
1. Developer Learning Curve
- Borrow checker complexity: Rust’s ownership and lifetime rules are powerful, but they can be tough for new developers (especially those used to Python’s flexibility). This could slow down onboarding of planners, researchers, or consultants who want to contribute code.
- Ecosystem familiarity: The ActivitySim developer community is comfortable with Python/pandas/numpy. Switching to Rust may alienate contributors or shrink the pool of collaborators unless there’s a clear Python-facing layer.
2. Ecosystem Maturity for Scientific Computing
- Sparse scientific libraries: While Rust has growing crates for math, linear algebra (
ndarray), Arrow, and parallelism, it doesn’t yet match Python’s rich ecosystem (NumPy, Pandas, SciPy, statsmodels). Many specialized econometrics/discrete choice tools may have to be re-implemented or wrapped. - Limited statistical packages: ActivitySim relies heavily on discrete choice models and simulation logic. Rust has less out-of-the-box support for these compared to Python/R.
3. Development Speed
- Boilerplate and verbosity: Rust code tends to be longer and more explicit than Python. A quick exploratory feature in Python might take much longer to prototype in Rust.
- Slower iteration cycles: Compilation and strict type checking make Rust reliable, but slower to iterate on compared to Python’s quick edit–run loop.
4. Interoperability Costs
- Python ↔ Rust boundary: If ActivitySim keeps a Python API (likely), every crossing between Python and Rust introduces a small overhead. While usually minor, it matters for fine-grained calls.
- Packaging & distribution: Delivering Rust-based binaries is more complex than pure Python (though tools like
maturinhelp). Agencies used topip install activitysimmight encounter new hurdles.
5. Community and Governance Risks
- Fragmentation: A Rust rewrite risks splitting the community — some might prefer the old Python base because it’s easier to hack on.
- Contributor barrier: Agencies or academics who occasionally patch ActivitySim may find Rust intimidating, reducing collaborative contributions.
- Support longevity: Python skills are widespread; Rust is growing fast but still niche. Ensuring a large enough pool of future maintainers is a strategic consideration.
6. Model Transparency
- Accessibility for planners: One of ActivitySim’s strengths is that model logic is relatively transparent to non-CS experts (because it’s Python + pandas). Rust code is less approachable, which could make the modeling “black boxier” for planners or reviewers.