Fall 2024 - CSC 2626: Imitation Learning for Robotics

This page contains an outline of the topics, content, and assignments for the semester. Note that this schedule will be updated as the semester progresses, with all changes documented here.

Week Date Topic Prepare Slides Notebooks Assignments Project
1 Mon, Sep 9 Imitation learning vs supervised learning πŸ“– πŸ–₯️ πŸ“‹ ✍️
Mitigating compunding errors via dataset aggregation πŸ“– πŸ–₯️ πŸ“‹
Mitigating compounding errors by choosing loss functions πŸ“– πŸ–₯️ πŸ“‹
Imitation via multi-modal generative models πŸ“– πŸ–₯️ πŸ“‹
Training instabilities of behavioral cloning πŸ“– πŸ–₯️ πŸ“‹
Teleoperation interfaces for manipulation (optional) πŸ“– πŸ–₯️ πŸ“‹
Querying experts only when necessary (optional) πŸ“– πŸ–₯️ πŸ“‹
Imitation for visual navigation and autonomous driving (optional) πŸ“– πŸ–₯️ πŸ“‹
2 Mon, Sep 16 Intro to optimal control πŸ“– πŸ–₯️ πŸ“‹
Intro to model-based reinforcement learning πŸ“– πŸ–₯️ πŸ“‹
Intro to model-free reinforcement learning πŸ“– πŸ–₯️ πŸ“‹
Monotonic improvement of the value function (optional) πŸ“– πŸ–₯️ πŸ“‹
Learning dynamics well only where it matters for the value function (optional) πŸ“– πŸ–₯️ πŸ“‹ πŸ“‚
Parallelizing MPC on the GPU (optional) πŸ“– πŸ–₯️ πŸ“‹
3 Mon, Sep 23 Offline / batch reinforcement learning πŸ“– πŸ–₯️ πŸ“‹ ✍️
Transitioning from offline to online RL πŸ“– πŸ–₯️ πŸ“‹
4 Mon, Sep 30 Imitation learning combined with RL and planning πŸ“– πŸ–₯️ πŸ“‹
Making cost-to-go queries to experts πŸ“– πŸ–₯️ πŸ“‹
Expert iteration πŸ“– πŸ–₯️ πŸ“‹
Imitation can improve search and exploration strategies πŸ“– πŸ–₯️ πŸ“‹
Learning from experts that have privileged information πŸ“– πŸ–₯️ πŸ“‹ πŸ“‚ ✍️
Dynamic movement primitives πŸ“– πŸ–₯️ πŸ“‹ πŸ“‚ ✍️
5 Mon, Oct 7 Inverse reinforcement learning πŸ“– πŸ–₯️ πŸ“‹
Inferring rewards from preferences πŸ“– πŸ–₯️ πŸ“‹
Task specification and human-robot dialog πŸ“– πŸ–₯️ πŸ“‹
Value alignment πŸ“– πŸ–₯️ πŸ“‹
6 Mon, Oct 14 Thanksgiving Monday πŸ“‚
No in-person lecture or office hours on Monday
TA office hours are on
7 Mon, Oct 21 Imitation as program induction πŸ“– πŸ–₯️ πŸ“‹
Modular decomposition of demonstrations into skills πŸ“– πŸ–₯️ πŸ“‹
(Hierarchical) imitation of multi-goal tasks πŸ“– πŸ–₯️ πŸ“‹
Inferring grammars and planning domains πŸ“– πŸ–₯️ πŸ“‹
8 Mon, Oct 28 Fall Reading Week
No lecture
No office hours
9 Mon, Nov 4 Adversarial imitation learning πŸ“– πŸ–₯️ πŸ“‹
10 Mon, Nov 11 Shared autonomy πŸ“– πŸ–₯️ πŸ“‹
Imitation with a human in the loop πŸ“– πŸ–₯️ πŸ“‹
Teleoperation πŸ“– πŸ–₯️ πŸ“‹
11 Mon, Nov 18 Imitation learning from videos πŸ“– πŸ–₯️ πŸ“‹
Causal confusion in imitation learning πŸ“– πŸ–₯️ πŸ“‹
12 Mon, Nov 25 Representation learning for imitation πŸ“– πŸ–₯️ πŸ“‹
Generalization and safety guarantees for imitation πŸ“– πŸ–₯️ πŸ“‹
13 Mon, Dec 2 Project presentations
14 Mon, Dec 9 Final project submission πŸ“‚

Week 1

Imitation learning vs supervised learning
An invitation to imitation
ALVINN: An autonomous land vehicle in a neural network

Mitigating compunding errors via dataset aggregation
DAgger: A reduction of imitation learning and structured prediction to no-regret online learning

Mitigating compounding errors by choosing loss functions
Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

Imitation via multi-modal generative models
Diffusion policy

Training instabilities of behavioral cloning
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

Teleoperation interfaces for manipulation [optional] UMI: Universal Manipulator Interface
ALOHA: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Real-Time Bimanual Dexterous Teleoperation for Imitation Learning
Teleoperation with Immersive Active Visual Feedback
ACE: A Cross-platform Visual-Exoskeleton for Low-Cost Dexterous Teleoperation
Querying experts only when necessary [optional] Maximum mean discrepancy imitation learning
DropoutDAgger: A Bayesian approach to safe imitation learning
SHIV: Reducing supervisor burden in DAgger using support vectors
Query-efficient imitation learning for end-to-end autonomous driving
Consistent estimators for learning to defer to an expert Selective sampling and imitation learning via online regression
Imitation for visual navigation and autonomous driving [optional] Visual path following on a manifold in unstructured three-dimensional terrain
End-to-end learning for self-driving cars
A machine learning approach to visual perception of forest trails for mobile robots
Learning monocular reactive UAV control in cluttered natural environments
Behavioral cloning with energy-based models [optional] Implicit behavioral cloning
Revisiting energy based models as policies: ranking noise contrastive estimation and interpolating energy models

Week 2

Intro to Optimal Control
Linear Quadratic Regulator and some examples
Iterative Linear Quadratic Regulator
Model Predictive Control
Ben Recht: An outsider’s tour of RL (watch his ICML’18 tutorial, too)

Intro to model-based RL [optional] PILCO: Probabilistic inference for learning control
Deep reinforcement learning in a handful of trials using probabilistic dynamics models
Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids
End-to-end differentiable physics for learning and control
Synthesizing neural network controllers with probabilistic model based reinforcement learning
A survey on policy search algorithms for learning robot controllers in a handful of trials
Reinforcement learning in robotics: a survey
DeepMPC: Learning deep latent features for model predictive control
Learning latent dynamics for planning from pixels
Monotonic improvement of the value function [optional] Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees
When to Trust Your Model: Model-Based Policy Optimization
Learning dynamics where it matters for the value function [optional] Value Gradient Weighted Model-Based Reinforcement Learning
Parallelizing MPC on the GPU [optional] MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU
STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation

Week 3

Offline / Batch Reinforcement Learning
Conservative Q-Learning for offline reinforcement learning
IQ-Learn: inverse soft-Q learning for imitation
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Off-policy deep reinforcement learning without exploration
D4RL: Datasets for deep data-driven reinforcement learning
What matters in learning from offline human demonstrations for robot manipulation
NeurIPS 2020 tutorial on offline RL

Transitioning from offline to online RL
Cal-QL: calibrated offline RL pre-training for efficient online fine-tuning

Optional reading Offline reinforcement learning: tutorial, review, and perspectives on open problems
Should I run offline reinforcement learning or behavioral cloning?
Why should I trust you, Bellman? The Bellman error is a poor replacement for value error
A minimalist approach to offline reinforcement learning
Benchmarking batch deep reinforcement learning algorithms
Stabilizing off-policy Q-Learning via bootstrapping error reduction
An optimistic perspective on offline reinforcement learning
COG: Connecting new skills to past experience with offline reinforcement learning
IRIS: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data
(Batch) reinforcement learning for robot soccer
Instabilities of offline RL with pre-trained neural representation
Targeted environment design from offline data

Week 4

Imitation learning combined with RL and planning
Learning neural network policies with guided policy search under unknown dynamics
Planning with diffusion for flexible behavior synthesis

Expert iteration
Thinking fast and slow with deep learning and tree search
Dual policy iteration
Learning to search via retrospective imitation

Learning from experts that have privileged information
PLATO: Policy learning using adaptive trajectory optimization

Dynamic movement primitives
Dynamic Movement Primitives in robotics: a tutorial survey
Using probabilistic movement primitives in robotics

Making cost-to-go queries to experts (optional) AggreVaTe: Reinforcement and imitation learning via interactive no-regret learning
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Truncated Horizon Policy Search: Combining RL & Imitation Learning
Imitation can improve search and exploration strategies (optional) Learning to gather information via imitation
Overcoming exploration in reinforcement learning with demonstrations
Data-driven planning via imitation learning

Week 5

Inverse reinforcement learning
Maximum entropy inverse reinforcement learning
Guided Cost Learning: Deep inverse optimal control via policy optimization
Bayesian inverse reinforcement learning

Inferring rewards from preferences
Active preference-based learning of reward functions
Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations

Inferring constraints from demonstrations
Inverse KKT: Learning cost functions of manipulation tasks from demonstrations

Task specification and human-robot dialog
Robots that ask for help: uncertainty alignment for LLM planners

Value alignment
Inverse reward design

Optional reading Nonlinear inverse reinforcement learning with gaussian processes
Maximum margin planning
Compatible reward inverse reinforcement learning
Learning the preferences of ignorant, inconsistent agents
Imputing a convex objective function
Better-than-demonstrator imitation learning via automatically-ranked demonstrations
Applications of Inverse RL (optional) Socially compliant mobile robot navigation via inverse reinforcement learning
Model-based probabilistic pursuit via inverse reinforcement learning
First-person activity forecasting with online inverse reinforcement learning
Learning strategies in table tennis using inverse reinforcement learning
Planning-based prediction for pedestrians
Activity forecasting
Large-scale cost function learning for path planning using deep inverse reinforcement learning

Week 6

Thanksgiving week. No lecture on Monday.

Week 7

Imitation as program induction
Neural programmer-interpreters
Neural Task Programming: Learning to generalize across hierarchical tasks

Modular decomposition of demonstrations into skills
TACO: Learning task decomposition via temporal alignment for control

(Hierarchical) imitation of multi-goal tasks
Learning to generalize across long-horizon tasks from human demonstrations

Inferring goals, grammars, and planning domains
The motion grammar: analysis of a linguistic method for robot control
Action understanding as inverse planning

Optional reading Incremental learning of subtasks from unsegmented demonstration
Inducing probabilistic context-free grammars for the sequencing of movement primitives
Neural Task Graphs: Generalizing to unseen tasks from a single video demonstration
Neural program synthesis from diverse demonstration videos
Automata guided reinforcement learning with demonstrations
A syntactic approach to robot imitation learning using probabilistic activity grammars
Robot learning from demonstration by constructing skill trees
Learning to sequence movement primitives from demonstrations
Imitation-projected programmatic reinforcement learning
Reinforcement and imitation learning for diverse visuomotor skills
Inferring task goals and constraints using Bayesian nonparametric inverse reinforcement learning
You only demonstrate once: category-level manipulation from single visual demonstration
Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation

Week 8

Fall reading week. No lecture.

Week 9

Adversarial imitation learning
GAIL: Generative adversarial imitation learning
Learning robust rewards with adversarial inverse reinforcement learning
InfoGAIL: interpretable imitation learning from visual demonstrations
What matters for adversarial imitation learning?
A divergence minimization perspective on imitation learning methods
Multi-agent generative adversarial imitation learning

Optional reading Model-free imitation learning with policy optimization
Imitation learning via off-policy distribution matching
Domain adaptive imitation learning

Week 10

Teleoperation
RelaxedIK: Real-time synthesis of accurate and feasible robot arm motion
Error-aware imitation learning from teleoperation data for mobile manipulation
Controlling assistive robots with learned latent actions

Shared autonomy
Shared autonomy via deep reinforcement learning
Shared autonomy via hindsight optimization

Imitation with a human in the loop
Learning models for shared control of human-machine systems with unknown dynamics
Human-in-the-loop imitation learning using remote teleoperation

Optional reading Optional Reading
Designing robot learners that ask good questions
Blending human and robot inputs for sliding scale autonomy
Inferring and assisting with constraints in shared autonomy
Collaborative control for a robotic wheelchair: evaluation of performance, attention, and workload
Director: A user interface designed for robot operation with shared autonomy
Learning multi-arm manipulation through collaborative teleoperation
Interactive autonomous driving through adaptation from participation

Week 11

Imitation learning from videos
K-VIL: Keypoints-based visual imitation learning
Track2Act: Predicting point tracks from internet videos enables generalizable robot manipulation
VideoDex: Learning dexterity from internet videos

Motion Retargeting
Robotic Telekinesis: Learning a robotic hand imitator by watching humans on YouTube

Causal confusion in imitation learning
Causal confusion in imitation learning

Optional reading Towards generalist robot learning from internet video: a survey
AVID: Learning multi-stage tasks via pixel-level translation of human videos
Dreamitate: Real-world visuomotor policy learning via video generation
Understanding Human Hands in Contact at Internet Scale
Zero-shot robot manipulation from passive human videos
SFV: Reinforcement Learning of Physical Skills from Videos
DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video
Diffusion Reward: Learning Rewards via Conditional Video Diffusion
Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
Video Prediction Models as Rewards for Reinforcement Learning
Giving Robots a Hand: Learning generalizable manipulation with eye-in-hand human video demonstrations
Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Human-to-robot imitation in the wild
Vision-Language Models as Success Detectors
Learning Generalizable Robotic Reward Functions from β€œIn-The-Wild” Human Videos
Semantic Visual Navigation by Watching YouTube Videos
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
HumanPlus: Humanoid Shadowing and Imitation from Humans
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Unifying 3D Representation and Control of Diverse Robots with a Single Camera

Week 12

Representation learning for imitation
Generalization guarantees for imitation learning
Provable representation learning for imitation with contrastive Fourier features
TRAIL: near-optimal imitation learning with suboptimal data
Representation matters: offline pretraining for sequential decision making
Self-supervised correspondence in visuomotor policy learning
The surprising effectiveness of representation learning for visual imitation

Generalization and safety guarantees for imitation
Provable guarantees for generative behavior cloning: bridging low-level stability and high-level behavior
Imitation learning with stability and safety guarantees

Week 13

Project presentations

Week 14

Final project submission