Information Geometry and Machine Learning Application | |
Student No.： | 50 |
Time： | Tue/Thu 19:00-21:00 |
Instructor： | Jun Zhang [University of Michigan Ann Arbor] |
Place： | Conference Room 3, Floor 2, Jin Chun Yuan West Building (近春园西楼) |
Starting Date： | 2012-6-26 |
Ending Date： | 2012-8-21 |
Notice:
Part A (Information Geometry):
Part B (Machine Learning Application):
Course description:
Information geometry is the differential geometric study of the manifold of probability density functions (or probability distributions on discrete support). From a geometric perspective, a parametric family of probability density functions on a sample space is modeledas a differentiable manifold, where points on the manifold represent the density functions themselves and coordinates represent the indexing parameters. Information Geometry is seen as an emerging tool for providing a unified perspective to many branches of information science, including coding, statistics, machine learning, inference and decision, etc.
This mini-course will provide an introduction to the fundamental concepts in Information Geometry as well as a sample application to machine learning. The course will be broken down to two parts: June 26-July 12 (5 classes) and August 7-21 (5 classes), with opportunities for students to develop individual research projects in between these sessions. Part A will introduce foundation of information geometry, includingtopics like: Kullback-Leibler divergence and Bregman divergence, Fisher-Rao metric, conjugate (dual) connections, alpha-connections, statistical manifold, curvature, dually-flat manifold, exponential family, natural parameter/expectation parameter, affine immersion,equiaffine geometry, centro-affine immersion, alpha-Hessian manifold, symplectic, Kahler, and Einstein-Weyl structures of information systems, etc. Part B will start with the regularized learning framework, with the introduction ofreproducing kernel Hilbert space, semi-inner product, reproducing kernel Banach space, representer theorem, feature map, kernel-trick, support vector machine, l1-regularization and sparsity, etc. Application of information geometry to kernel methods will be discussed at the end of this mini-course.
Students at advanced undergraduate and graduate levels are welcomed. The instructor looks forward to working with motivated mathematics students in this exciting new area of applied mathematics.
Prerequisite:
A first course in differential geometry is expected for Part A. Real analysis or function analysisis expected for Part B.
Reference for the course:
(A.1) S. Amari and H. Nagaoka (2000). Method of Information Geometry. AMS monograph vol 191. Oxford University Press.
(A.2) U. Simon, A. Schwenk-Schellschmidt, and H. Viesel. (1991). Introduction to the Affine Differential Geometry of Hypersurfaces. Science University of Tokyo Press.
(A.3) Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, vol16, 159-195.
(B.1) S. Amari and S. Wu (1999).Improving support vector machine classifiers by modifying kernel functions.Neural Networks, vol12(no 6), 783-789.
(B.2) F. Cucker and S. Smale (2001). On the mathematical foundation of learning. Bulletin of the American Mathematical Society, vol 39 (no.1), 1-49.
(B.3) T. Poggio and S. Smale (2003). The mathematics of learning: Dealing with data. Notice of the American Mathematical Society, vol 50 (no.5), 534-544.