Feature Request: Add NEAT (Nash-Equilibrium Adaptive Training) Optimizer
Description: Neural network optimization for billion-parameter models faces critical gradient conflict issues where parameter updates across different layers interfere destructively, leading to slower convergence, higher variance, and resource inefficiency. NEAT (Nash-Equilibrium Adaptive Training) addresses this by modeling neural network optimization as a multi-agent game governed by Nash equilibrium principles, treating each layer as a rational agent. This game-theoretic optimizer achieves significantly faster convergence, improved stability, and substantial resource and environmental savings.
Key Contributions (from 2025 TJAS research paper by Goutham Ronanki):
- Nash Gradient Equilibrium (NGE): Each layer acts as a rational player; gradients are projected onto the Nash equilibrium manifold using the network's graph Laplacian, reducing destructive gradient interference.
- NG-Adam: Integrates NGE with Adam by adding equilibrium correction to momentum estimation.
- Nash Step Allocation (NSA): Layerwise adaptive learning rates increase for well-aligned gradients, decrease for high-conflict layers.
- Empirical Results:
- 28% faster convergence (32,400 vs. 45,000 steps; Adam baseline).
- 20% reduction in GPU hours, with proportional cost and carbon savings (8–10 metric tons CO₂/run).
- Dramatic reduction in layer gradient conflicts (mean cosine similarity: Adam -0.12 → NEAT +0.08).
- Consistent benefits scale with larger models (improvement grows from 16% @50M to 31% @1.2B params).
- All results statistically significant (p < 0.001, Cohen's d > 0.8).
Algorithmic Sketch (from paper Appendix):
# NEAT Nash-Equilibrium Adaptive Training
for batch in training_data:
G = compute_gradients(model, batch)
L = graph_laplacian(model_structure)
G_equil = (I - mu * L) @ G
m = beta1 * m + (1 - beta1) * G_equil
v = beta2 * v + (1 - beta2) * (G_equil ** 2)
eta_i = eta / (1 + ||L G_i||) # Nash Step Allocation
param -= eta_i * m / (sqrt(v) + eps)
Implementation Plan:
- tf.keras native optimizer integrating NGE, NG-Adam, and NSA
- Laplacian construction for neural architectures
- Full usage/benchmark notebooks
- Empirical validation pipeline on open datasets (text, vision)
References:
- Ronanki, G. Nash-Equilibrium Adaptive Training (NEAT). TJAS, 2025 (full PDF attached, see GitHub)
- https://github.com/ItCodinTime/neat-optimizer
Theoretical background, further results, and step-by-step algorithmic descriptions are included in the attached PDF (see repo). Please review and advise on desired API/interface for TF Addons inclusion.
This project is no longer maintained or updated.
Is there another project where I could get my optimizer onto TF? Any way you could erfer me or give me feedback @sun1638650145?
There's not much you can do if you insist on using tensorflow you'll have to implement this optimizer yourself.
Hi @sun1638650145, thank you for the response? Could you please give me steps or link a tutorial or guide me in the right way to do this!
So specifically what I am trying to do is put my optimizer in the next release of tensorflow alongside with adam so others can use it.
I don't recommend you do that, because you've already implemented it in pytorch (which is sufficient). It seems that the official tensorflow team isn't maintaining it much anymore and hardly anyone uses it now.
I have several issues and PRs that have been pending for over half a year without any response from the tensorflow team. I'm just suggesting that you don't waste your time on it.
Alright then! Thank you so much for your time @sun1638650145