Amir-massoud Farahmand

(SoloGen)

 

Research Goal


Developing adaptive intelligent agents has been my main research goal for the past few years. I study reinforcement learning methods that adapt to the regularities of the problem to reduce the sample complexity of learning in large-scale problems. Before that, I had studied hierarchical behavior-based architectures and evolutionary approaches for agent design. See my publications for more information.


Applications of my research range from robotics and control engineering to operations research, finance, health sciences, and computer games.

Research Interests


  1. BulletSequential Decision-Making Problems (Reinforcement Learning and Planning): regularization techniques (e.g., regularized fitted Value Iteration, LSTD, and Bellman Residual Minimization), model selection and empirical evaluation, error propagation in API/AVI, RKHS formulation, studying regularities of RL/Planning problems

  2. BulletMachine Learning (supervised and unsupervised learning): Nonparametric statistical methods, statistical learning theory, regularization techniques, concentration of measure inequalities, manifold learning (dimension estimation), non-i.i.d. processes

  3. BulletRobotics: Uncalibrated visual servoing, behavior-based architecture for robot control, multi-agent robotics

  4. BulletEvolutionary Computation: cooperative co-evolution, interaction of evolution and learning

  1. BulletAnti Memoirs (ضدخاطرات) is here!

  2. BulletMy ML-related tumblr is here!

  3. BulletMy Twitter account (not ML-related most of the time).

News


Many practitioners of reinforcement learning problems have observed that the performance of the agent often reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. In my Action-Gap Phenomenon in Reinforcement Learning paper, which has been accepted to the NIPS 2011 conference, I explain and formalize this phenomenon by introducing the concept of the action-gap regularity. I show that if the problem has a favorable action-gap regularity, the convergence rate of the performance loss might be much faster than the rate of the error in estimating the optimal action-value function.