Learn Reward Functions

CS 285 final project
Random Expert Distillation for reinforcement learning

Source Code

Since reward functions are difficult to obtain in real-world problems, learning reward functions has gained a lot of interest in the reinforcement learning community. Adversarial optimizations have been employed to learn reward functions but can suffer from instabilities and are hard to implement. In this report, we consider a recently proposed method – random expert distillation (RED) – in attaining reward functions for reinforcement learning. This method estimates whether a given state-action tuple belongs to the support of the expert data using prediction error from a network that is trained to mimic a random mapping for the expert data, and this error value on new state-action tuples can be translated into reward. It has been successfully demonstrated in training policies on various low-dimensional tasks and was shown to perform better in some tasks than using adversarial optimizations. In this paper, we extend the RED technique to high-dimensional input states, through two approaches. One approach is to incorporate convolutional neural networks in the RED critic and policy, and the other approach is to utilize an autoencoder to compress the raw inputs before applying RED. Our experiments show that the latter approach succeeds in dealing with high-dimensional pixel inputs in the Atari game MsPacman.