Willow Ventures

A New MIT Study Shows Reinforcement Learning Minimizes Catastrophic Forgetting Compared to Supervised Fine-Tuning | Insights by Willow Ventures

A New MIT Study Shows Reinforcement Learning Minimizes Catastrophic Forgetting Compared to Supervised Fine-Tuning | Insights by Willow Ventures

Understanding Catastrophic Forgetting in Foundation Models

In the realm of artificial intelligence, foundation models are transforming how tasks across multiple domains are approached. However, a significant challenge known as catastrophic forgetting limits their ability to retain previously acquired skills when fine-tuned for new tasks.

What is Catastrophic Forgetting?

Catastrophic forgetting refers to the phenomenon whereby an AI model loses its previously learned knowledge when it absorbs new information. This limitation poses hurdles in developing long-lasting AI agents capable of continuous improvement.

Online Reinforcement Learning vs. Supervised Fine-Tuning

Recent research conducted by MIT highlights a key distinction between reinforcement learning (RL) and supervised fine-tuning (SFT). Both methods can achieve high performance on new tasks, yet SFT often overwrites prior capabilities, while RL retains them. The difference lies in how each technique adjusts the model’s output distribution compared to the base policy.

Measuring Forgetting in AI Models

To quantify forgetting, researchers propose an empirical forgetting law that utilizes KL divergence:

[
\text{Forgetting} \propto \text{KL}(\pi_0 || \pi)
]

Here, (\pi_0) represents the base model, while (\pi) is the fine-tuned one. The forward KL divergence, assessed on new tasks, can predict the degree of forgetting without requiring data from earlier tasks.

Insights from Experiments on Large Language Models

In experiments with the Qwen 2.5 3B-Instruct model, fine-tuning was conducted across three areas:

  • Math reasoning (Open-Reasoner-Zero)
  • Science Q&A (SciKnowEval subset)
  • Tool use (ToolAlpaca)

Results demonstrated that RL not only improved accuracy in new tasks but also minimized declines in prior-task performance. Conversely, SFT showed a consistent trade-off, sacrificing previous capabilities for new-task proficiency.

Insights from Robotics Tasks

In robotics, experiments using OpenVLA-7B within pick-and-place scenarios revealed similar findings. RL adaptations preserved essential manipulation skills across tasks while SFT sacrificed these for short-term gains.

Lessons from the ParityMNIST Study

The research team explored a simplified problem called ParityMNIST to isolate mechanisms of forgetting. While both RL and SFT showed high new-task accuracy, SFT led to sharper declines in accuracy in auxiliary benchmarks, validating the predictive role of KL divergence.

The Importance of On-Policy Updates

What makes RL so effective? On-policy RL samples from the model’s own outputs, incrementally adjusting them based on rewards, which restricts learning to distributions close to the base model. In contrast, SFT often optimizes against outdated labels that may not reflect current knowledge.

Addressing Alternative Explanations

The research scrutinized several alternative explanations, such as weight-space changes and hidden representation drift. Yet, none matched the predictive strength of forward KL divergence, further emphasizing its critical role.

Broader Implications for AI Development

The findings carry meaningful implications for future AI system designs:

  • Evaluation Practices: Post-training evaluations should consider KL-conservatism alongside task accuracy.
  • Hybrid Models: Combining the efficiency of SFT with explicit KL minimization may strike a balance between new-task performance and forgetting.
  • Continual Learning: Utilizing the RL Razor approach can facilitate designing agents that adaptively learn new skills without compromising previous knowledge.

Conclusion

This MIT research redefines catastrophic forgetting as a distributional issue centered on forward KL divergence. The inherent structure of reinforcement learning allows for better retention of previous knowledge, thus paving the way for creating AI models that can learn continually without losing their foundational skills.


Key Takeaways:

  • Reinforcement Learning vs. Supervised Fine-Tuning: RL is more effective at preserving previous knowledge than SFT.
  • Predictability of Forgetting: The extent of forgetting correlates with KL divergence measured on new tasks.
  • Importance of On-Policy Updates: RL’s learning mechanism ensures proximity to the base model, reducing forgetting.
  • Real-World Validation: Experimentation confirms RL’s robustness across various domains, including LLMs and robotics.
  • Future Directions: Post-training models should be assessed for their distributional changes in addition to performance metrics.

Related Keywords: catastrophic forgetting, reinforcement learning, supervised fine-tuning, KL divergence, continual learning, foundation models, AI training.


Source link