This course introduces the fundamentals of optimization for machine learning (ML) and artificial intelligence (AI). Rather than providing an extensive overview of existing optimization algorithms, it focuses around the workhorse of modern ML/AI: stochastic gradient descent (SGD). It studies its extensions, such as variance reduction and adaptive methods, and derives convergence guarantees in convex and non-convex settings.
The curriculum also explore modern techniques such as minimax optimization, particularly in the context of constrained optimization and duality; zero-th order methods, which optimize without gradient information; and online and distributed optimization challenges. The course emphasizes theoretical foundations and practical applications, equipping the students with the tools needed to design and analyze efficient optimization algorithms for contemporary AI systems.