Willow Ventures

PokeeResearch-7B: An Open 7B Deep-Research Agent Trained with Reinforcement Learning from AI Feedback (RLAIF) and a Robust Reasoning Scaffold | Insights by Willow Ventures

PokeeResearch-7B: An Open 7B Deep-Research Agent Trained with Reinforcement Learning from AI Feedback (RLAIF) and a Robust Reasoning Scaffold | Insights by Willow Ventures

Introducing PokeeResearch-7B: A Breakthrough in AI Research Agents Pokee AI has taken a significant step in artificial intelligence by open sourcing PokeeResearch-7B, a powerful 7-billion parameter deep research agent. Designed for executing comprehensive research loops, this AI can break down queries, conduct searches, validate responses, and synthesize threads of information into a cohesive answer. What […]

Weak-for-Strong (W4S): A Novel Reinforcement Learning Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs | Insights by Willow Ventures

Weak-for-Strong (W4S): A Novel Reinforcement Learning Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs | Insights by Willow Ventures

Introduction to Weak-for-Strong Harnessing (W4S) in Reinforcement Learning In recent advancements in artificial intelligence, researchers from Stanford, EPFL, and UNC have introduced the Weak-for-Strong Harnessing (W4S) framework. This innovative approach in Reinforcement Learning (RL) enables a lightweight meta-agent to design and optimize code workflows that leverage more potent executor models. What is Weak-for-Strong Harnessing (W4S)? […]

RA3: Mid-Training with Temporal Action Abstractions for Faster Reinforcement Learning (RL) Post-Training in Code LLMs | Insights by Willow Ventures

RA3: Mid-Training with Temporal Action Abstractions for Faster Reinforcement Learning (RL) Post-Training in Code LLMs | Insights by Willow Ventures

Accelerating Reinforcement Learning: Unveiling RA3 and Mid-Training Insights Recent research from Apple introduces groundbreaking concepts in reinforcement learning (RL) through the launch of RA3 (Reasoning as Action Abstractions). This innovative approach highlights how mid-training can optimize RL post-training, offering a significant stride in code generation tasks. What Does the Research Present? This study presents a […]

Stanford Researchers Released AgentFlow: In-the-Flow Reinforcement Learning RL for Modular, Tool-Using AI Agents | Insights by Willow Ventures

Stanford Researchers Released AgentFlow: In-the-Flow Reinforcement Learning RL for Modular, Tool-Using AI Agents | Insights by Willow Ventures

Introducing AgentFlow: A Revolutionary Framework for AI Agents AgentFlow is an innovative framework for developing trainable AI agents, structured around four key modules: Planner, Executor, Verifier, and Generator. By implementing an advanced policy optimization method named Flow-GRPO, AgentFlow enhances the performance of agents in multi-turn, tool-integrated reasoning. What is AgentFlow? AgentFlow formalizes tool-using agents into […]

A New MIT Study Shows Reinforcement Learning Minimizes Catastrophic Forgetting Compared to Supervised Fine-Tuning | Insights by Willow Ventures

A New MIT Study Shows Reinforcement Learning Minimizes Catastrophic Forgetting Compared to Supervised Fine-Tuning | Insights by Willow Ventures

Understanding Catastrophic Forgetting in Foundation Models In the realm of artificial intelligence, foundation models are transforming how tasks across multiple domains are approached. However, a significant challenge known as catastrophic forgetting limits their ability to retain previously acquired skills when fine-tuned for new tasks. What is Catastrophic Forgetting? Catastrophic forgetting refers to the phenomenon whereby […]