Optimization-and-Search
Optimization-and-Search copied to clipboard
Implementation and visualization (some demos) of search and optimization algorithms.
Optimization-and-Search
Implementation and visualization of optimization algorithms.
please add MAthJax Plugin for Github to your browser.
Table of Contents
- Optimization-and-Search
- Table of Contents
-
1. Numerical Optimization
- Gradient Descent
- Conjugate Descent
- Newton Method
-
2. Stochastic Search
- Simulated Annealing
- Cross-Entropy Methods
- Search Gradient
-
3. Classical Search
- A* search algorithm (A star search algorithm)
- minimax search
-
4. Dynamic Programming
- Value iteration
- Policy iteration
-
5.
- Monte Carlo Policy Evaluation
- Temporal Difference Policy Evaluation
- Tabular Q learning
-
6.
- Deep Q learning
-
7.
- Monte Carlo Tree Search
-
8.
- DPLL
1. Numerical Optimization
Here we are trying to solve simple quadratic problem.
$$\arg \text{min } \frac{1}{2}x^TQx + b^Tx$$
$$Q \geq 0, x \in R^n$$
The animation(left) is tested on N=10, (right) for n=2;
Gradient Descent
using first-order gradient and learning rate
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/gradient_descent_1.gif)
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/gradient_descent_2.gif)
Conjugate Descent
$x^{k+1} := x^k + a_kd^k$
using line search to compute the step size $\alpha$
$$a_k = \frac{\nabla f(x^k)^Td^k}{(d^k)^TQd^k}$$
find conjugate direction
$$d^{k+1} = -\nabla f(x^{k+1}) + \beta_kd^k$$
$$ \beta_k = \frac{\nabla f(x^{k+1})^T\nabla f(x^{k+1})}{\nabla f(x^k)^T\nabla f(x^k)} \text{ (FR)}$$
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/conjugate_descent_1.gif)
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/conjugate_descent_2.gif)
Newton Method
using second-order gradient
$x^{k+1} := x^k + d^k$
$d^k = -[\nabla^2 f(x^k)]^{-1}\nabla f(x^k)$
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/newton_descent_1.gif)
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/newton_descent_2.gif)
2. Stochastic Search
Here we try to use stochastic search to solve TSP problems.
Simulated Annealing
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/simulated_annealing_1.gif)
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/SA.gif)
Cross-Entropy Methods
cross entropy using less steps to get converged
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/cross_entropy_1.gif)
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/cross_entropy.gif)
Search Gradient
Here trying to find lowest position. Since for TSP, I cannot find $\nabla_\theta \log (p_\theta (z_i))$. If you have any idea please be free to comment. Thanks!
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/search_gradient.gif)
3. Classical Search
A* search algorithm (A star search algorithm)
The code is writen according to the slide in CSE257, UCSD.
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/Astar_algorithm.jpg)
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/Astar.gif)
minimax search
4. Dynamic Programming
Value iteration
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/vl_algorithm.jpg)
We can see we get the optimal policy before our utility function converges.
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/value_iteration.gif)
Policy iteration
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/policy_iteration_algorithm.jpg)
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/policy_iteration.gif)
5.
Monte Carlo Policy Evaluation
- Policy: $\pi (s_i) = a_i$
- Trajectories: $s_0, s_1, s_2, .., s_T$
- Rewards: $R(s_0) + \gamma R(s_1) + .. + \gamma^TR(s_T)$
Follow policy and get a lot of samples of trajectories, and estimate the expectation $$V^{\pi}(s_0) = \mathbb{E}\pi [\sum{i=0}^T\gamma^iR(s_i)]$$
Temporal Difference Policy Evaluation
The difference between TD and MC method is that TD could update in every step and do not necessarily wait for a whole trajectory to generate.
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/images/TD.png)
Tabular Q learning
Off policy method $$V(s) = \max_a (\mathbb{E}[R(s)+\gamma V(s'|s,a)])$$
6.
Deep Q learning
In this case I used Deep Q network to train a self-driving car. The environment is created by NeuralNine (github), thought he used NEAT package to train the car.
The demo is below: you can modify the network and train your own car!
Random case:
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/DeepQLearning/result/random.gif)
![](https://github.com/alwaysbyx/Optimization-and-Search/raw/main/DeepQLearning/result/DQN.gif)