Lmg I Lert U

Since LMG also linearizes the nonlinear HJI equation under similar assumptions of LMDP, an optimal policy can be computed efficiently. Recently, the framework of the linearly solvable Markov game (LMG) is proposed as an extension of LMDP, in which the optimal value function is obtained as a solution of the Hamilton–Jacobi–Isaacs (HJI) equation. One possible way to overcome this problem is to adopt concepts from the robust control theory, which considers the worst adversary and derives an optimal controller using a game theoretic solution. Model learning is integrated with LMDP in discrete problems and in continuous problems, but the performance of the obtained controllers is critically affected by the accuracy of the environmental model. The major drawback of the LMDP framework is, however, that an environmental model is given in advance. The LMDP framework has been applied in domains such as character control for animation, optimal assignment of communication resources in cellular telephone systems and real-robot control.

Linearly solvable Markov decision process (LMDP) is a computational framework to efficiently solve the Bellman equation by an exponential transformation of the value function under some constraints on action-dependent cost. In model-based reinforcement learning, an optimal controller is derived from an optimal value (cost-to-go) function by solving the Bellman equation, which is often intractable due to its nonlinearity. Experimental results support the usefulness of LMG framework when acquiring an accurate model of the environment is difficult. When there is a discrepancy between the model used for building the control policy and dynamics of the tested environment, the LMG-based control policy maintained good performance while that of the LMDP-based control policy deteriorated drastically.

This paper investigates the robustness of LMDP- and LMG-based controllers against modeling errors in both discrete and continuous state-action problems. To overcome the problem of the sensitivity, linearly solvable Markov game (LMG) has been proposed, which is an extension of LMDP based on the game theory. Since LMDP is regarded as model-based reinforcement learning, the performance of LMDP is sensitive to the accuracy of the environmental model.

As a model-based reinforcement learning technique, linearly solvable Markov decision process (LMDP) gives an efficient way to find an optimal policy by making the Bellman equation linear under some assumptions.