Jun 01, 2026

Contact-Aware Pick and Place with a 6-DoF Arm

Introduction

I am developing a real-world automated pick-and-place application utilizing a 6-DoF robotic arm, targeting the UR10e and UR5e platforms from Universal Robots. The immediate application is laboratory automation — sample sorting, reagent handling, and repetitive bench-top tasks that are high-value targets for robotic automation but require fine manipulation and reliable contact handling. The long-term goal is a generalized manipulation platform transferable across research and industrial settings.

Approach

The core architecture is a three-layer hierarchy: a collision-free motion planner (RRT-Connect) handles gross arm movement, a phase-aware sequential task framework decomposes the pick-and-place into discrete stages each with its own completion conditions and sub-policy, and residual reinforcement learning is reserved for contact-rich phases (grasp and placement) where classical planning breaks down. This design deliberately minimizes the scope of what RL needs to learn, reducing sample complexity and making sim-to-real transfer more tractable.

Technical Stack

MuJoCo, dm_control, RRT-Connect path planning, multi-seed inverse kinematics, sequential task decomposition, PD control with feedforward gravity compensation, reinforcement learning.

Status

This project is in active development. Completed components:

  1. Multi-seed inverse kinematics: LM solver with multiple seed configurations, singularity detection, and quaternion double-cover handling
  2. Sequential task framework: phase-based task decomposition with per-phase completion conditions, clean policy/task boundary, and episode-safe resets
  3. Sub-policy architecture: per-phase sub-policies with timestep-independent interpolated position control and feedforward gravity compensation
  4. Physics-accurate simulation: custom UR10e + Robotiq 2F-85 3D-models built with PyMJCF, including TCP site instrumentation and dynamic visualization markers
  5. Visualization utilities: mocap-based runtime markers, configurable camera rendering, and video export pipeline

To Do:

  1. Proximity-aware penalty: reward shaping to discourage near-collisions during approach and transfer phases
  2. Obstacle avoidance with real-time update: dynamic replanning as scene geometry changes
  3. Feedback linearization: full gravity and Coriolis compensation for improved low-speed trajectory tracking — see TMT-method in Compass-Gait Walker post on formulation
  4. Residual RL for contact phases: learned corrective policy over grasp descent and placement, trained on top of the classical planner
  5. Sim-to-real transfer: domain randomization, actuator modelling, and latency compensation for deployment on physical hardware