TELL ME WHY! –EXPLANATIONS SUPPORT LEARNING OF RELATIONAL AND CAUSAL STRUCTURE

(Lampinen. A et.al)

This paper investigates the effectiveness of natural language explanations for an agent’s action in a reinforcement learning setting. To demonstrate this, authors considered an Odd-one-out task, as these tasks are cognitively challenging and can be used for both reasoning and abstraction. In the case of an Odd One Out task, they introduced 2D and 3D environments and agents were expected to select an odd object in a collection of objects. In the case of a 2D environment, action corresponds to directional movements, while in the case of 3D, it includes directional movements along with look around option and object grasping action.

In Odd-one-out task, agent is provided with multiple objects with varying shape, color size, and texture. The goal of an agent is to pick a unique object in a given set (unique along any variability dimension). Here, for example if set consists of green-triangle, green-square, green-pentagon, and red-circle agent needs to pick red circle as an odd example. Paper proposed two different types of explanations (i) property explanation: here, explanations are generated only via agents’ choice (environment rewards are not considered), and (ii) reward explanation: explanations are generated based on agents’ choice and environment reward. It’s unclear to me how object properties can be considered as an explanation rather than object description, as pointed out by one of the reviewers.

The proposed framework uses an encoded image vector to predict policy, value, reconstruct an image, and generate explanations. The predicted policy is an agent’s decision on odd object, value is used to reduce variance in policy gradient update, image reconstruction helps the agent to correlate latent information with image context, and finally, NLP explanation is the predicted reason for agents choice. The important thing to consider here is that image reconstruction and NLP explanations are fully supervised.

Authors claim that incorporating explanation as a part of reward signal (i) improves agents’ ability to perform abstract reasoning, (ii) improves generalizability, (iii) disentangles confounding factors, and (iv) encourages an agent to capture causal structure in data. These claims seem to be quite bold; authors validated these claims via multiple experiments in a constrained environment setting (whether this is enough to decide the general validity of these claims is controversial). Each experiment is used to validate one of these claims. Empirical results validate the usefulness of explanation in training an agent.

Authors outline multiple experiments to validate their claims about having NLP based explanations. Some of those experiments are listed here:

  • Explanations eliminate easy biases: In an odd-one-out task, if the answer is unclear or if all objects in a set are completely unique along one dimension (example: colour, shape, size, and texture) the agent prefers to select the most prominent feature colour. Author claim that explanations can easily eliminate this kind of agent behaviour

  • Explanations after environment reward signal and explanations just based on agent choices provide complementary benefits: “but both types together result in substantially faster learning”, “reward explanations are necessary for any learning. The likely reason for this is clear when considering the episode structure— the relevance of a transformation to the final reward is much more directly conveyed by the reward explanations than the property ones”

  • Behaviour and context related explanations are most effective: authors constructed an experiment with situation-relevant but behaviour-irrelevant explanations; these explanations are randomly sampled from a list of all possible property and reward explanations and context-irrelevant explanations which are randomly sampled from all possible explanations for any agents actions. Authors claim that have behaviour and context-specific explanations are more effective

  • Explanations help in learning causal influence diagram implicitly: Authors constructed an experiment where an agent is given an option to intervene and decide the odd one out; they claim that agents have a capability to understand causal influence among features. It’s unclear from the paper how these causal influences are decided and measured

Opinion

Authors did a great job of outlining experiments, but I don’t really understand if the comparison of traditional agents against agents trained with explanations is fair as explanations in this setting can be considered background knowledge, and supervised training directly influences the policy behaviour of an agent. Overall this work makes very interesting claims which may or may not be true (I feel experiments in the paper are limited and constrained to say anything about the claims); this provides an opportunity for future work to design and prove the generalizability of these claims. As NLP can faithfully capture human values, enforcing agents to predict human values in any given state could be an interesting perspective for inner alignment, this work showcases the usefulness and effectiveness of having nlp head. I feel it would be much more relevant, to consider this framework as a step toward inner alignment rather than an explainable RL.

Leave a Comment