1 | Gradient Descent | Updates parameters in the negative direction of the gradient |
2 | Stochastic Gradient Descent(SGD) | Updates parameters using the gradient of a single randomly-selected instance |
3 | Mini-Batch Gradient Descent | Compromise between Batch Gradient Descent and SGD, it uses a mini-batch of m instances |
4 | Momentum | Gradually build up speed if gradient keeps pointing in the same direction |
5 | Nesterov Accelerated Gradient | Measures gradient of the cost function not at the local position but slightly ahead in the direction of the momentum |
6 | Adagrad | Adapts the learning rates to the parameters, performing smaller updates for parameters associated with frequently occurring features |
7 | RMSProp | Fixes the diminishing learning rates problem of Adagrad by accumulating only the most recent iterations |
8 | Adam Optimization | Combines the advantages of RMSProp and momentum |
9 | AdaMax | A variant of Adam Optimization with more stable behavior in terms of large gradient steps |
10 | Nadam | Stands for Nesterov-accelerated Adaptive Moment Estimation, combines Nesterov and Adam. |
11 | Ftrl | Follow the Regularized Leader – it combines L1 and L2 regularization |
12 | Newton’s Method | Uses second-order information to define a quadratic approximation of the loss function and then optimizes the quadratic approximation |
13 | Broyden–Fletcher–Goldfarb–Shanno (BFGS) | Quasi-Newton method for optimizing |
14 | Conjugate Gradient | An algorithm for the numerical solution of particular systems of linear equations |
15 | Covariance Matrix Adaptation Evolution Strategy (CMA-ES) | An evolutionary algorithm for difficult non-linear non-convex optimization problems in continuous domain |
16 | Particle Swarm Optimization (PSO) | A method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality |
17 | Ant Colony Optimization (ACO) | An algorithm for finding optimal paths that is based on behavior of ants searching for food |
18 | Genetic Algorithm (GA) | A heuristic that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotype in the hope of finding good solutions to a given problem |
19 | Simulated Annealing (SA) | A probabilistic technique for approximating the global optimum of a given function |
20 | Tabu Search | A metaheuristic that guides a local heuristic search procedure to explore the solution space beyond local optimality |