Fall 2024 - CSC 2626: Imitation Learning for Robotics
This page contains an outline of the topics, content, and assignments for the semester. Note that this schedule will be updated as the semester progresses, with all changes documented here.
Week | Date | Topic | Prepare | Slides | Notebooks | Assignments | Project |
---|---|---|---|---|---|---|---|
1 | Mon, Sep 9 | Imitation learning vs supervised learning | π | π₯οΈ | π | βοΈ | |
Mitigating compunding errors via dataset aggregation | π | π₯οΈ | π | ||||
Mitigating compounding errors by choosing loss functions | π | π₯οΈ | π | ||||
Imitation via multi-modal generative models | π | π₯οΈ | π | ||||
Training instabilities of behavioral cloning | π | π₯οΈ | π | ||||
Teleoperation interfaces for manipulation (optional) | π | π₯οΈ | π | ||||
Querying experts only when necessary (optional) | π | π₯οΈ | π | ||||
Imitation for visual navigation and autonomous driving (optional) | π | π₯οΈ | π | ||||
2 | Mon, Sep 16 | Intro to optimal control | π | π₯οΈ | π | ||
Intro to model-based reinforcement learning | π | π₯οΈ | π | ||||
Intro to model-free reinforcement learning | π | π₯οΈ | π | ||||
Monotonic improvement of the value function (optional) | π | π₯οΈ | π | ||||
Learning dynamics well only where it matters for the value function (optional) | π | π₯οΈ | π | π | |||
Parallelizing MPC on the GPU (optional) | π | π₯οΈ | π | ||||
3 | Mon, Sep 23 | Offline / batch reinforcement learning | π | π₯οΈ | π | βοΈ | |
Transitioning from offline to online RL | π | π₯οΈ | π | ||||
4 | Mon, Sep 30 | Imitation learning combined with RL and planning | π | π₯οΈ | π | ||
Making cost-to-go queries to experts | π | π₯οΈ | π | ||||
Expert iteration | π | π₯οΈ | π | ||||
Imitation can improve search and exploration strategies | π | π₯οΈ | π | ||||
Learning from experts that have privileged information | π | π₯οΈ | π | π | βοΈ | ||
Dynamic movement primitives | π | π₯οΈ | π | π | βοΈ | ||
5 | Mon, Oct 7 | Inverse reinforcement learning | π | π₯οΈ | π | ||
Inferring rewards from preferences | π | π₯οΈ | π | ||||
Task specification and human-robot dialog | π | π₯οΈ | π | ||||
Value alignment | π | π₯οΈ | π | ||||
6 | Mon, Oct 14 | Thanksgiving Monday | π | ||||
No in-person lecture or office hours on Monday | |||||||
TA office hours are on | |||||||
7 | Mon, Oct 21 | Imitation as program induction | π | π₯οΈ | π | ||
Modular decomposition of demonstrations into skills | π | π₯οΈ | π | ||||
(Hierarchical) imitation of multi-goal tasks | π | π₯οΈ | π | ||||
Inferring grammars and planning domains | π | π₯οΈ | π | ||||
8 | Mon, Oct 28 | Fall Reading Week | |||||
No lecture | |||||||
No office hours | |||||||
9 | Mon, Nov 4 | Adversarial imitation learning | π | π₯οΈ | π | ||
10 | Mon, Nov 11 | Shared autonomy | π | π₯οΈ | π | ||
Imitation with a human in the loop | π | π₯οΈ | π | ||||
Teleoperation | π | π₯οΈ | π | ||||
11 | Mon, Nov 18 | Imitation learning from videos | π | π₯οΈ | π | ||
Causal confusion in imitation learning | π | π₯οΈ | π | ||||
12 | Mon, Nov 25 | Representation learning for imitation | π | π₯οΈ | π | ||
Generalization and safety guarantees for imitation | π | π₯οΈ | π | ||||
13 | Mon, Dec 2 | Project presentations | |||||
14 | Mon, Dec 9 | Final project submission | π |
Week 1
Imitation learning vs supervised learning
An invitation to imitation
ALVINN: An autonomous land vehicle in a neural network
Mitigating compunding errors via dataset aggregation
DAgger: A reduction of imitation learning and structured prediction to no-regret online learning
Mitigating compounding errors by choosing loss functions
Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Imitation via multi-modal generative models
Diffusion policy
Training instabilities of behavioral cloning
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Teleoperation interfaces for manipulation [optional]
UMI: Universal Manipulator InterfaceALOHA: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Real-Time Bimanual Dexterous Teleoperation for Imitation Learning
Teleoperation with Immersive Active Visual Feedback
ACE: A Cross-platform Visual-Exoskeleton for Low-Cost Dexterous Teleoperation
Querying experts only when necessary [optional]
Maximum mean discrepancy imitation learningDropoutDAgger: A Bayesian approach to safe imitation learning
SHIV: Reducing supervisor burden in DAgger using support vectors
Query-efficient imitation learning for end-to-end autonomous driving
Consistent estimators for learning to defer to an expert Selective sampling and imitation learning via online regression
Imitation for visual navigation and autonomous driving [optional]
Visual path following on a manifold in unstructured three-dimensional terrainEnd-to-end learning for self-driving cars
A machine learning approach to visual perception of forest trails for mobile robots
Learning monocular reactive UAV control in cluttered natural environments
Behavioral cloning with energy-based models [optional]
Implicit behavioral cloningRevisiting energy based models as policies: ranking noise contrastive estimation and interpolating energy models
Week 2
Intro to Optimal Control
Linear Quadratic Regulator and some examples
Iterative Linear Quadratic Regulator
Model Predictive Control
Ben Recht: An outsiderβs tour of RL (watch his ICMLβ18 tutorial, too)
Intro to model-based RL [optional]
PILCO: Probabilistic inference for learning controlDeep reinforcement learning in a handful of trials using probabilistic dynamics models
Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids
End-to-end differentiable physics for learning and control
Synthesizing neural network controllers with probabilistic model based reinforcement learning
A survey on policy search algorithms for learning robot controllers in a handful of trials
Reinforcement learning in robotics: a survey
DeepMPC: Learning deep latent features for model predictive control
Learning latent dynamics for planning from pixels
Monotonic improvement of the value function [optional]
Algorithmic framework for model-based deep reinforcement learning with theoretical guaranteesWhen to Trust Your Model: Model-Based Policy Optimization
Learning dynamics where it matters for the value function [optional]
Value Gradient Weighted Model-Based Reinforcement LearningWeek 3
Offline / Batch Reinforcement Learning
Conservative Q-Learning for offline reinforcement learning
IQ-Learn: inverse soft-Q learning for imitation
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Off-policy deep reinforcement learning without exploration
D4RL: Datasets for deep data-driven reinforcement learning
What matters in learning from offline human demonstrations for robot manipulation
NeurIPS 2020 tutorial on offline RL
Transitioning from offline to online RL
Cal-QL: calibrated offline RL pre-training for efficient online fine-tuning
Optional reading
Offline reinforcement learning: tutorial, review, and perspectives on open problemsShould I run offline reinforcement learning or behavioral cloning?
Why should I trust you, Bellman? The Bellman error is a poor replacement for value error
A minimalist approach to offline reinforcement learning
Benchmarking batch deep reinforcement learning algorithms
Stabilizing off-policy Q-Learning via bootstrapping error reduction
An optimistic perspective on offline reinforcement learning
COG: Connecting new skills to past experience with offline reinforcement learning
IRIS: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data
(Batch) reinforcement learning for robot soccer
Instabilities of offline RL with pre-trained neural representation
Targeted environment design from offline data
Week 4
Imitation learning combined with RL and planning
Learning neural network policies with guided policy search under unknown dynamics
Planning with diffusion for flexible behavior synthesis
Expert iteration
Thinking fast and slow with deep learning and tree search
Dual policy iteration
Learning to search via retrospective imitation
Learning from experts that have privileged information
PLATO: Policy learning using adaptive trajectory optimization
Dynamic movement primitives
Dynamic Movement Primitives in robotics: a tutorial survey
Using probabilistic movement primitives in robotics
Making cost-to-go queries to experts (optional)
AggreVaTe: Reinforcement and imitation learning via interactive no-regret learningDeeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Truncated Horizon Policy Search: Combining RL & Imitation Learning
Imitation can improve search and exploration strategies (optional)
Learning to gather information via imitationOvercoming exploration in reinforcement learning with demonstrations
Data-driven planning via imitation learning
Week 5
Inverse reinforcement learning
Maximum entropy inverse reinforcement learning
Guided Cost Learning: Deep inverse optimal control via policy optimization
Bayesian inverse reinforcement learning
Inferring rewards from preferences
Active preference-based learning of reward functions
Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations
Inferring constraints from demonstrations
Inverse KKT: Learning cost functions of manipulation tasks from demonstrations
Task specification and human-robot dialog
Robots that ask for help: uncertainty alignment for LLM planners
Value alignment
Inverse reward design
Optional reading
Nonlinear inverse reinforcement learning with gaussian processesMaximum margin planning
Compatible reward inverse reinforcement learning
Learning the preferences of ignorant, inconsistent agents
Imputing a convex objective function
Better-than-demonstrator imitation learning via automatically-ranked demonstrations
Applications of Inverse RL (optional)
Socially compliant mobile robot navigation via inverse reinforcement learningModel-based probabilistic pursuit via inverse reinforcement learning
First-person activity forecasting with online inverse reinforcement learning
Learning strategies in table tennis using inverse reinforcement learning
Planning-based prediction for pedestrians
Activity forecasting
Large-scale cost function learning for path planning using deep inverse reinforcement learning
Week 6
Thanksgiving week. No lecture on Monday.
Week 7
Imitation as program induction
Neural programmer-interpreters
Neural Task Programming: Learning to generalize across hierarchical tasks
Modular decomposition of demonstrations into skills
TACO: Learning task decomposition via temporal alignment for control
(Hierarchical) imitation of multi-goal tasks
Learning to generalize across long-horizon tasks from human demonstrations
Inferring goals, grammars, and planning domains
The motion grammar: analysis of a linguistic method for robot control
Action understanding as inverse planning
Optional reading
Incremental learning of subtasks from unsegmented demonstrationInducing probabilistic context-free grammars for the sequencing of movement primitives
Neural Task Graphs: Generalizing to unseen tasks from a single video demonstration
Neural program synthesis from diverse demonstration videos
Automata guided reinforcement learning with demonstrations
A syntactic approach to robot imitation learning using probabilistic activity grammars
Robot learning from demonstration by constructing skill trees
Learning to sequence movement primitives from demonstrations
Imitation-projected programmatic reinforcement learning
Reinforcement and imitation learning for diverse visuomotor skills
Inferring task goals and constraints using Bayesian nonparametric inverse reinforcement learning
You only demonstrate once: category-level manipulation from single visual demonstration
Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation
Week 8
Fall reading week. No lecture.
Week 9
Adversarial imitation learning
GAIL: Generative adversarial imitation learning
Learning robust rewards with adversarial inverse reinforcement learning
InfoGAIL: interpretable imitation learning from visual demonstrations
What matters for adversarial imitation learning?
A divergence minimization perspective on imitation learning methods
Multi-agent generative adversarial imitation learning
Week 10
Teleoperation
RelaxedIK: Real-time synthesis of accurate and feasible robot arm motion
Error-aware imitation learning from teleoperation data for mobile manipulation
Controlling assistive robots with learned latent actions
Shared autonomy
Shared autonomy via deep reinforcement learning
Shared autonomy via hindsight optimization
Imitation with a human in the loop
Learning models for shared control of human-machine systems with unknown dynamics
Human-in-the-loop imitation learning using remote teleoperation
Optional reading
Optional ReadingDesigning robot learners that ask good questions
Blending human and robot inputs for sliding scale autonomy
Inferring and assisting with constraints in shared autonomy
Collaborative control for a robotic wheelchair: evaluation of performance, attention, and workload
Director: A user interface designed for robot operation with shared autonomy
Learning multi-arm manipulation through collaborative teleoperation
Interactive autonomous driving through adaptation from participation
Week 11
Imitation learning from videos
K-VIL: Keypoints-based visual imitation learning
Track2Act: Predicting point tracks from internet videos enables generalizable robot manipulation
VideoDex: Learning dexterity from internet videos
Motion Retargeting
Robotic Telekinesis: Learning a robotic hand imitator by watching humans on YouTube
Causal confusion in imitation learning
Causal confusion in imitation learning
Optional reading
Towards generalist robot learning from internet video: a surveyAVID: Learning multi-stage tasks via pixel-level translation of human videos
Dreamitate: Real-world visuomotor policy learning via video generation
Understanding Human Hands in Contact at Internet Scale
Zero-shot robot manipulation from passive human videos
SFV: Reinforcement Learning of Physical Skills from Videos
DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video
Diffusion Reward: Learning Rewards via Conditional Video Diffusion
Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
Video Prediction Models as Rewards for Reinforcement Learning
Giving Robots a Hand: Learning generalizable manipulation with eye-in-hand human video demonstrations
Estimating Q(s,sβ) with Deep Deterministic Dynamics Gradients
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Human-to-robot imitation in the wild
Vision-Language Models as Success Detectors
Learning Generalizable Robotic Reward Functions from βIn-The-Wildβ Human Videos
Semantic Visual Navigation by Watching YouTube Videos
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
HumanPlus: Humanoid Shadowing and Imitation from Humans
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Unifying 3D Representation and Control of Diverse Robots with a Single Camera
Week 12
Representation learning for imitation
Generalization guarantees for imitation learning
Provable representation learning for imitation with contrastive Fourier features
TRAIL: near-optimal imitation learning with suboptimal data
Representation matters: offline pretraining for sequential decision making
Self-supervised correspondence in visuomotor policy learning
The surprising effectiveness of representation learning for visual imitation
Generalization and safety guarantees for imitation
Provable guarantees for generative behavior cloning: bridging low-level stability and high-level behavior
Imitation learning with stability and safety guarantees
Week 13
Project presentations
Week 14
Final project submission