Recording Tactile Demonstrations

Tactile data is the missing modality in most robot learning datasets. This page explains why it matters, how to record synchronized tactile + arm + camera streams, the extended dataset format, and how to train policies that use tactile inputs.

Why Tactile Data Improves Robot Learning

Vision tells a policy where the gripper is. Proprioception tells it how far the fingers are closed. Neither tells it whether the grasp is stable. A policy trained on vision + joint data alone must learn to infer grasp quality indirectly — from object motion, arm force limits, or trial-and-error during rollout. Adding tactile sensing provides direct contact state supervision: the policy receives a ground-truth signal distinguishing a secure grasp from a slip-prone one at every timestep of every demonstration. This is especially impactful for deformable, transparent, or variably-sized objects where visual grasp quality estimation is unreliable.

Hardware Setup for Synchronized Recording

A complete multi-modal recording rig requires three hardware layers, all synchronized to a common clock:

  1. Robot arm — provides joint positions, velocities, and end-effector pose at 100–500 Hz via USB or Ethernet. Use the arm SDK's timestamp API, not system time, to get hardware-stamped joint state.
  2. Paxini Gen3 sensor(s) — plugged into a powered USB hub mounted at the robot wrist. Each frame is timestamped by the host PC at USB interrupt time (nanosecond resolution, <0.5 ms jitter).
  3. Camera — one wrist-mounted camera (optional: one overhead camera). Use a USB or GigE camera with hardware trigger sync, or a software-triggered camera with known latency. Record at 30–60 fps.

All three sources write timestamps using the same monotonic host clock. The platform SDK's MultiSourceRecorder aligns frames at post-processing time using timestamp interpolation.

# Complete synchronized recording session from paxini.sync import MultiSourceRecorder recorder = MultiSourceRecorder( arm=arm_interface, sensor=paxini.Sensor(), camera=camera_interface, output_dir="./demo_recordings/", episode_prefix="grasp_place" ) recorder.start_episode() # ... perform the manipulation demo ... recorder.end_episode() # saves episode_000.hdf5

Dataset Format — Extended LeRobot Schema

The Paxini Gen3 data collection pipeline extends the standard LeRobot HDF5 dataset format with additional tactile channels. Existing LeRobot tools (data loading, visualization, policy training) remain fully compatible — the new keys are simply ignored by pipelines that do not use them.

HDF5 Key Shape Source
observation.state(T, 7)Arm joint positions + gripper width
observation.images.wrist(T, H, W, 3)Wrist camera (uint8 RGB)
action(T, 7)Target joint positions + gripper command
observation.tactile.pressure_map(T, 8, 8)Paxini Gen3 pressure array (kPa, float32)
observation.tactile.total_force_n(T,)Total normal force per frame (Newtons)
observation.tactile.in_contact(T,)Boolean contact flag per frame
observation.tactile.contact_centroid(T, 2)Contact centroid (row, col) per frame
meta/timestamps_ns(T,)Nanosecond timestamps for all channels

New tactile keys are highlighted. All other keys follow the standard LeRobot schema.

Quality Checklist for Tactile Data

Run baseline calibration before each session Call sensor.calibrate() with the gripper open and unloaded. This zeros out finger self-contact and cable stress. Recalibrate if the arm is repositioned significantly.
Verify contact events align with video Review 5 episodes in the data visualizer before collecting your full dataset. The in_contact rising edge should coincide with the visible moment of fingertip-object contact in the camera feed. A lag >20 ms indicates a timestamp alignment issue.
Cover the full force range in your demonstrations Aim to record grasps at light, medium, and firm grip levels. If all your demos use maximum gripper force, the policy will not learn to modulate contact pressure. Vary the object weight and compliance across episodes.
Flag and exclude slip events from training data Episodes where the object slips mid-grasp but the demo continues to a successful outcome contain conflicting supervision signal. Use the SDK's paxini.annotate.flag_slip_events(episode) to automatically mark these for review.
Check sensor saturation If pressure_map.max() hits 600 kPa in any episode, the sensor is saturating. Reduce gripper force or use the palm variant (lower peak pressure per taxel) for heavier grasps.

Policy Training with Tactile Inputs

To add tactile as an observation modality in ACT or Diffusion Policy, extend the observation config to include the pressure map or the aggregated scalar (total_force_n). The pressure map provides full spatial information but adds 64 floats per frame per sensor; the scalar is easier to integrate and sufficient for binary grasp quality tasks.

# ACT config snippet — add tactile to observation space observation_keys: - observation.state # joint positions - observation.images.wrist # camera - observation.tactile.total_force_n # scalar - observation.tactile.pressure_map # optional: full map # Normalize tactile observations tactile_normalization: total_force_n: {mean: 2.5, std: 1.8} pressure_map: {mean: 12.0, std: 45.0} # kPa statistics from your dataset

For the full training walkthrough — including how to compute normalization statistics from your recorded dataset and evaluate against a vision-only baseline — see the learning path Unit 5.

For broader context on dexterous hand data collection strategies, see the Dexterous Hands guide.