Latency Requirements by Task
Not all teleoperation tasks have the same latency tolerance. The three tiers that matter in practice:
- Precision contact tasks (<30ms required): Peg insertion, surface following, connector mating. At these tolerances, only local-area network teleoperation is viable. Even 50ms introduces enough phase lag in the operator's feedback loop to make precise contact judgment unreliable.
- Medium-precision tasks (100–300ms acceptable): Object pick-and-place with >5mm tolerance, manipulation of large objects, locomotion control. Wide-area internet with compensation strategies works here. This covers the majority of manipulation data collection tasks.
- High-latency tasks (>500ms — not viable for direct control): At this latency, direct teleoperation breaks down for any task requiring reactive control. Supervisory control (send high-level commands, robot executes autonomously) is the only viable mode above 500ms.
Latency Sources and Optimization
| Pipeline Stage | Typical Latency | Optimization |
|---|---|---|
| Network propagation (US coast-to-coast) | 70–90ms | Operator geographic routing — match operators to nearby robots |
| Video encoding (H.264 software) | 50–100ms | Switch to WebRTC VP8 hardware encode: 15–30ms |
| Video decoding (browser) | 10–30ms | Enable hardware acceleration in browser |
| Control command serialization/deserialization | 2–5ms | Binary protocol (MessagePack/protobuf) vs JSON |
| Robot controller loop overhead | 5–20ms | High-priority RT thread for command processing |
| Camera capture + USB frame delivery | 10–33ms | USB3 camera at 60fps: 16ms max frame age |
Latency Compensation Strategies
- Smith Predictor: A classical control compensation technique for systems with known constant delay. The Smith predictor runs an internal model of the robot dynamics in parallel, shifted by the round-trip delay, so the operator is effectively controlling the model output rather than the delayed real output. Works well when delay is stable and the plant model is accurate. Falls apart with variable internet jitter (±30ms swings break the constant-delay assumption).
- Visual Lead: Operators are trained to aim for where the robot will be rather than where the video shows it currently. A visual overlay on the operator interface shows the predicted robot position based on the last command and the known latency. Simple, effective, and doesn't require model accuracy. Preferred method for most real deployments.
- Task Selection by Latency Tier: The most robust strategy. Measure round-trip latency at session start and automatically route operators to appropriate tasks. Operators with <50ms RTT get contact-critical tasks; operators with 100–200ms RTT get standard pick-place; operators above 250ms are assigned to navigation or supervisory tasks.
Video Streaming Technology Comparison
| Technology | Glass-to-Glass Latency | Bandwidth | Recommendation |
|---|---|---|---|
| WebRTC VP8 (hardware encode) | 30–50ms | 2–8 Mbps | Best for teleoperation — use this |
| WebRTC H.264 | 40–80ms | 1.5–5 Mbps | Good fallback if VP8 not available |
| H.264 over RTSP | 100–150ms | 1–3 Mbps | Acceptable for supervisory control |
| MJPEG over HTTP | 100–300ms | 10–50 Mbps | Avoid — high bandwidth, high latency |
| JPEG frames over WebSocket | 200–500ms | 5–20 Mbps | Do not use for teleoperation |
The Latency Pipeline: Where Every Millisecond Goes
Total end-to-end latency in a teleoperation system is the sum of multiple pipeline stages, each contributing independently. Understanding this pipeline is essential for optimization because different stages require different solutions.
A typical US domestic teleoperation session (operator in SF, robot in NYC) has a total pipeline of: camera capture (16ms at 60fps) + video encoding (25ms WebRTC VP8) + network propagation (75ms coast-to-coast) + video decoding (15ms) + operator perception and reaction (150-300ms) + command serialization (3ms) + network return trip (75ms) + command deserialization (3ms) + robot controller processing (10ms) = 372-522ms total loop latency. Of this, the operator's reaction time dominates. The engineering-controllable portion (everything except human reaction) is approximately 222ms, with network propagation accounting for 68% of the controllable latency.
This decomposition reveals an important insight: optimizing video encoding from 100ms to 25ms (a 75ms improvement) has roughly the same impact as moving the operator 1,000 km closer to the robot (which reduces RTT by approximately 10ms per 1,000 km). Software optimization of the video pipeline is almost always more cost-effective than geographic co-location.
Advanced Delay Compensation: Wave Variables and Predictor Methods
Wave variable formulation: Wave variables (Niemeyer and Slotine, 1991) transform the velocity-force control signals into wave variables that propagate through the communication channel. The key property: wave variables guarantee passivity (energy is not created by the communication channel), which ensures stability regardless of the delay magnitude. This makes wave variables the theoretically correct approach for bilateral force-feedback teleoperation over variable-delay networks.
In practice, wave variable implementations trade stability for transparency -- the operator feels a "mushy" response rather than crisp force feedback, because the wave transformation smooths the force signal. For data collection tasks where force feedback quality is secondary to task completion, this tradeoff is acceptable. For surgical teleoperation where force transparency is critical, wave variables are combined with local force models that enhance the perceived transparency without violating passivity.
Predictor-based compensation: Beyond the Smith predictor (which assumes constant delay), time-varying delay predictors use a local dynamic model of the robot to predict its state forward by the current estimated delay. The Kalman filter-based predictor is the most common: it maintains a state estimate of the remote robot and propagates it forward using the robot's dynamics model and the most recent control commands. When delayed observations arrive from the real robot, the Kalman filter corrects the prediction. This approach handles variable jitter well but requires an accurate dynamic model of the remote robot.
Model-mediated teleoperation: The operator interacts with a local virtual replica of the remote environment. The replica is updated asynchronously from the real robot's sensor data. The operator controls the virtual replica with zero delay, and the real robot tracks the virtual replica's state with whatever delay the network imposes. This decouples the operator's experience from the network quality entirely. The cost: the virtual replica can diverge from reality if the environment changes in ways the replica does not model (an object moves, a person enters the workspace). Used successfully for space teleoperation and increasingly for terrestrial remote data collection.
Protocol Comparison: TCP vs. UDP vs. QUIC vs. WebRTC
| Protocol | Control Commands | Video | NAT Traversal | Recommendation |
|---|---|---|---|---|
| TCP (WebSocket) | Reliable delivery; head-of-line blocking adds 10-50ms on packet loss | Usable but suboptimal -- retransmits stale frames | Works everywhere | Use for non-time-critical data (logs, config) |
| UDP | Lowest latency; no retransmit; application handles loss | Requires custom codec integration | Often blocked by firewalls | Best for control commands if NAT is not an issue |
| QUIC | Independent streams -- no HOL blocking; built-in encryption | Promising; no mature video-over-QUIC stack yet | Built on UDP; ICE-like traversal needed | Future-best; not mature enough for production in 2026 |
| WebRTC | DataChannel (SCTP/UDP): reliable or unreliable modes | Native video codec support; adaptive bitrate; congestion control | ICE + STUN/TURN built-in | Production choice for 2026 -- handles video + control + NAT |
SVRC's teleop platform uses WebRTC for both video (media channels) and control commands (unreliable DataChannel). Unreliable DataChannel mode drops stale control commands rather than retransmitting them, which is the correct behavior for real-time control: a 100ms-old position command is worse than no command at all. For safety-critical signals (e-stop), a parallel reliable TCP WebSocket ensures guaranteed delivery.
Real-World Latency Measurements
SVRC has collected latency data from over 500 remote teleoperation sessions across a range of network conditions. Key findings from production data:
| Route | Median RTT | P95 RTT | Packet Loss | Data Quality Impact |
|---|---|---|---|---|
| Same city (fiber) | 8ms | 15ms | <0.01% | Indistinguishable from local |
| SF to NYC (fiber) | 72ms | 95ms | 0.02% | 5-8% lower demo quality (slower, more hesitant) |
| US to Europe (fiber) | 130ms | 180ms | 0.05% | 15-20% lower demo quality; contact tasks degraded |
| US to Asia (fiber) | 175ms | 250ms | 0.1% | 25-35% lower quality; pick-place only |
| Home WiFi (shared) | 35ms | 250ms | 0.5-2% | Jitter is the killer -- intermittent 200ms+ spikes cause jerky demos |
The data quality impact column reflects the measured difference in demonstration smoothness (mean jerk) and task completion time compared to local teleoperation. The critical finding: latency below 100ms RTT produces demonstrations that are statistically indistinguishable from local control for standard pick-place tasks. Above 100ms, quality degrades approximately linearly with latency. Above 200ms, only slow, non-contact tasks produce usable demonstration data.
Packet Loss Handling
Internet packet loss is typically 0.01-0.5% on good connections but can spike to 2-5% during congestion. For control commands sent at 50 Hz, 1% loss means one dropped command every 2 seconds. The impact depends on how the receiving controller handles missing commands:
- Hold last command (zero-order hold): The robot continues executing the last received command until a new one arrives. Safe for velocity commands (the robot continues at the last commanded velocity). Dangerous for position commands (the robot stops, which may leave it in an unstable contact state).
- Extrapolate (first-order hold): The robot extrapolates the command trajectory based on the last two received commands. Better for smooth motion continuity but can overshoot if the operator was decelerating. Use with a velocity magnitude limit to prevent runaway.
- Interpolation buffer: Buffer 2-3 future commands (adds 40-60ms latency) and interpolate between them. Smooths out single-packet losses entirely. Recommended for data collection where consistent trajectory quality matters more than absolute minimum latency.
SVRC uses a 2-command interpolation buffer with adaptive depth: the buffer grows during periods of high loss (detected by monitoring inter-packet intervals) and shrinks during stable periods. This provides consistent motion quality at the cost of slightly variable latency (50-100ms additional buffer in worst case).
Related Reading
- Teleoperation Fatigue and Ergonomics Study
- Robot Training Data: Collection Methods and Best Practices
- Robot Trajectory Annotation: Challenges and Quality Standards
- Robot Deployment Checklist
- SVRC Teleop Platform
- SVRC Data Collection Services
Jitter Management: The Underestimated Problem
Network latency gets the attention, but jitter -- the variation in latency over time -- is often more damaging to teleoperation quality than the absolute delay. A consistent 100ms delay is manageable with visual lead techniques. A connection that oscillates between 30ms and 200ms produces jerky, unpredictable robot behavior that operators cannot adapt to.
Measuring jitter. SVRC measures jitter as the interquartile range (IQR) of round-trip times over a 30-second window. A healthy connection has IQR below 10ms. Marginal connections show IQR of 10-30ms. Connections with IQR above 30ms produce demonstrably lower-quality demonstrations and should trigger a session pause until network conditions improve.
Jitter buffering. The standard mitigation is a jitter buffer on the robot side that holds received commands for a configurable duration before executing them. A 50ms jitter buffer eliminates most jitter artifacts at the cost of 50ms additional latency. The buffer depth should be adaptive: set to the 95th percentile RTT minus the median RTT, updated every 10 seconds. During stable periods (low jitter), the buffer shrinks to minimize latency. During unstable periods (high jitter), the buffer grows to maintain smooth control.
# adaptive_jitter_buffer.py -- Adaptive jitter buffer for teleop commands
import collections
import time
import numpy as np
class AdaptiveJitterBuffer:
"""Buffer incoming teleop commands to smooth out network jitter."""
def __init__(self, min_depth_ms=10, max_depth_ms=100):
self.min_depth = min_depth_ms / 1000.0
self.max_depth = max_depth_ms / 1000.0
self.rtt_history = collections.deque(maxlen=100) # Last 100 RTT samples
self.buffer = collections.deque() # (release_time, command) pairs
self.current_depth = self.min_depth
def update_rtt(self, rtt_seconds):
"""Feed in a new RTT measurement."""
self.rtt_history.append(rtt_seconds)
if len(self.rtt_history) >= 20:
p50 = np.percentile(list(self.rtt_history), 50)
p95 = np.percentile(list(self.rtt_history), 95)
self.current_depth = np.clip(p95 - p50, self.min_depth, self.max_depth)
def enqueue(self, command):
"""Add a command with its scheduled release time."""
release_time = time.monotonic() + self.current_depth
self.buffer.append((release_time, command))
def dequeue(self):
"""Return the next command if its release time has passed."""
now = time.monotonic()
if self.buffer and self.buffer[0][0] <= now:
return self.buffer.popleft()[1]
return None # No command ready; use hold-last or extrapolation
Operator Interface Design for Latency
The operator's UI can significantly mitigate the perceived impact of latency through predictive visualization and status feedback.
- Predicted position overlay: Display a semi-transparent ghost of the robot at the position it will reach based on the last sent command and the current estimated delay. This lets the operator see the commanded state rather than the delayed actual state. Implementation: maintain a local kinematic model that processes commands immediately and renders the prediction alongside the delayed video feed.
- Latency indicator: Display the current RTT prominently in the operator interface. Use green/yellow/red color coding: green below 100ms, yellow 100-200ms, red above 200ms. Operators should adjust their speed and caution level based on the current latency tier. SVRC's platform uses a latency gauge in the top-right corner of the teleop view.
- Force feedback scaling: When using haptic feedback devices, scale the force feedback gain inversely with latency. At low latency (< 50ms), full force feedback provides useful contact information. At high latency (> 150ms), high-gain force feedback causes oscillation because the phase lag between operator input and force response exceeds the stability margin. Reduce gain to 30-50% at 150ms+ latency.
- Speed limiting: Automatically reduce the maximum commanded velocity as latency increases. At 50ms RTT, allow full-speed operation (1.0 m/s end-effector). At 150ms RTT, limit to 0.3 m/s. At 300ms RTT, limit to 0.1 m/s. These limits prevent the operator from commanding motions that would overshoot due to the delayed visual feedback.
Data Quality Assessment for Remote Sessions
Demonstration quality from remote teleoperation sessions should be assessed separately from local sessions. SVRC uses three metrics to determine whether a remote demonstration meets training data quality standards.
- Trajectory smoothness (mean jerk): Compute the third derivative of the end-effector position trajectory. Remote demonstrations with latency above 100ms typically show 40-80% higher mean jerk than local demonstrations of the same task. Demonstrations with jerk above 2x the local baseline should be flagged for review -- they may still be successful but produce noisier training signal.
- Task completion time ratio: Remote demonstrations take longer than local demonstrations. A completion time ratio above 2.0x (remote takes more than twice as long as local) correlates with degraded demonstration quality (excessive hesitation, corrective movements). These episodes should be reviewed and potentially excluded or weighted down during training.
- Command smoothness: Analyze the operator's raw command stream for discontinuities -- sudden velocity reversals, long pauses, and rapid oscillations. These indicate the operator is fighting latency. More than 3 reversals per episode on a simple pick-place task is a quality flag.
In SVRC's experience, remote demonstrations collected at sub-100ms RTT are indistinguishable from local demonstrations for training purposes. Between 100-200ms, demonstrations are usable but should be mixed 2:1 with local demonstrations in the training set. Above 200ms, demonstrations should only be used for non-contact locomotion and navigation tasks.
WebRTC Configuration for Robot Teleoperation
WebRTC is the standard protocol for real-time video streaming in robot teleoperation because it handles NAT traversal, adaptive bitrate, and jitter buffering natively. However, the default WebRTC configuration is optimized for video conferencing, not robot control. Key configuration changes for teleoperation:
// WebRTC configuration optimized for robot teleoperation
const rtcConfig = {
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{ urls: 'turn:turn.roboticscenter.ai:3478',
username: 'teleop', credential: 'token' }
],
iceCandidatePoolSize: 10, // Pre-gather candidates for faster connection
};
// Video encoder settings for low-latency robot camera streams
const videoConstraints = {
width: { ideal: 640, max: 1280 },
height: { ideal: 480, max: 720 },
frameRate: { ideal: 30, max: 30 },
};
// SDP modifications for low latency (apply to offer/answer)
function optimizeSdp(sdp) {
// Force VP8 (lower latency than H.264 in software encode)
// Set max bitrate to prevent congestion-induced latency spikes
sdp = sdp.replace(/a=fmtp:(\d+) /, 'a=fmtp:$1 max-fr=30;max-fs=3600;');
// Add x-google-max-bitrate for Chrome
sdp = sdp.replace(/a=mid:video\r\n/,
'a=mid:video\r\nb=AS:2000\r\n'); // 2 Mbps cap
return sdp;
}
Key tuning decisions: (1) Use VP8 over H.264 when hardware encoding is not available -- VP8 has lower software encoding latency. (2) Cap bitrate at 2 Mbps per camera to prevent buffer bloat at the network layer. (3) Use 640x480 resolution rather than 720p or 1080p -- the latency-to-quality tradeoff strongly favors lower resolution for teleoperation. (4) Set frame rate to 30fps; dropping to 15fps saves bandwidth but degrades operator performance on fast tasks by ~20%.
Predictive Display: Compensating for Visual Delay
Predictive display is the most effective technique for maintaining operator performance at 100-300ms latency. The system renders a predicted robot state overlaid on the delayed camera image, showing the operator where the robot is now (predicted) rather than where it was when the image was captured.
Implementation requires: (1) a kinematic model of the robot that can predict joint positions T_latency seconds into the future, (2) a 3D rendering overlay that shows the predicted robot pose on top of the camera feed, and (3) a latency estimator that keeps the prediction horizon matched to the actual round-trip delay.
For simple motions (free-space reaching), forward kinematics prediction using the last commanded trajectory is sufficient and adds negligible compute. For contact tasks, prediction must account for expected contact forces and potential trajectory deviations, which requires a dynamics model or a learned predictor. In SVRC's evaluations, predictive display maintains operator performance at 150ms RTT to within 90% of the zero-latency baseline for pick-and-place tasks, and within 70% for insertion tasks.
Multi-Camera Stream Management
Production teleoperation setups typically use 2-4 cameras (overhead, wrist, side views). Managing multiple video streams over a latency-constrained connection requires careful prioritization.
- Primary stream (wrist camera): Highest priority. Full resolution (640x480), 30fps, lowest compression. This is the operator's primary spatial reference for fine manipulation.
- Secondary stream (overhead): Medium priority. 320x240, 15fps during free-space motion; 640x480, 30fps during grasp phase. The overhead view provides workspace context but is less critical frame-by-frame.
- Tertiary streams (side views): Lowest priority. 320x240, 10fps normally. Activated to full resolution only when the operator explicitly selects the view. These streams are bandwidth reserves that can be reclaimed when the primary stream needs more bandwidth.
Total bandwidth for this configuration: 3-6 Mbps sustained, 8 Mbps peak. If available bandwidth drops below 4 Mbps, progressively reduce secondary and tertiary streams before degrading the primary. SVRC's platform implements this priority scheme automatically with real-time bandwidth monitoring.
Related Reading
Imitation Learning Guide · Cost per Demonstration Analysis · Mobile ALOHA Setup Guide · Data Annotation Challenges · Data Services · SVRC Platform
Control Command Protocols: UDP vs. TCP vs. WebSocket
The choice of transport protocol for control commands has a significant impact on teleoperation latency and reliability. Each protocol has distinct tradeoffs for robot control.
| Protocol | Latency Overhead | Reliability | NAT Traversal | Best Use Case |
|---|---|---|---|---|
| Raw UDP | Minimal (< 1ms) | Unreliable (no retransmit) | None (need VPN or port forward) | LAN teleoperation with direct network access |
| WebRTC DataChannel | Low (2-5ms) | Configurable (ordered/unordered, reliable/unreliable) | Built-in (STUN/TURN) | Internet teleoperation (SVRC recommendation) |
| WebSocket (TCP) | Medium (5-20ms) | Reliable (TCP retransmit) | Easy (HTTP upgrade) | Monitoring, non-time-critical control |
| gRPC (HTTP/2) | Medium (5-15ms) | Reliable | Requires proxy | Structured control APIs, microservice architecture |
Protocol selection guide: For LAN-only deployments (robot and operator in the same building), raw UDP provides the lowest latency with minimal complexity. For internet-based teleoperation across NATs and firewalls, WebRTC DataChannels are the clear choice -- they handle NAT traversal automatically and provide configurable reliability without TCP's head-of-line blocking problem. WebSocket over TCP is appropriate only for monitoring dashboards or non-time-critical commands (start/stop recording, change task parameters) where reliability matters more than latency. gRPC is useful for structured control APIs in microservice architectures but adds serialization overhead that makes it unsuitable for high-frequency control loops.
SVRC uses WebRTC DataChannels configured in "unreliable, unordered" mode for control commands. This gives UDP-like latency with built-in NAT traversal. Control commands are sent at 50 Hz with sequence numbers; the receiver discards out-of-order commands rather than reordering them (a stale command is worse than a skipped one). For state feedback (joint positions, F/T readings), a separate "reliable, ordered" DataChannel ensures no state updates are lost, at the cost of slightly higher latency on retransmission.
Latency Compensation for Data Collection Quality
Remote teleoperation demonstrations collected at different latency levels require different handling in the training pipeline. Blindly mixing high-latency and low-latency demonstrations can degrade policy quality because the latency artifacts (hesitations, corrections, oscillations) are interpreted by the policy as intentional behavior.
- Tag demonstrations with collection latency. Record the mean and P95 RTT for each episode. This metadata enables latency-aware data filtering and weighting during training.
- Weight demonstrations by latency quality. During training, assign weights inversely proportional to collection latency: weight = 1.0 for sub-50ms, 0.8 for 50-100ms, 0.5 for 100-200ms, 0.3 for 200ms+. This reduces the influence of latency-degraded episodes without discarding them entirely.
- Filter by trajectory smoothness. Compute the mean jerk (third derivative of position) for each episode. Episodes with jerk above 2 standard deviations from the task mean are likely latency-degraded. Exclude these from the training set or assign low weight.
- Temporal smoothing as post-processing. Apply a Savitzky-Golay filter (window=11, order=3) to the action trajectories of high-latency episodes before using them for training. This removes the high-frequency oscillations introduced by latency-compensating operator behavior while preserving the gross trajectory shape.
- Latency-stratified training. For large datasets collected over variable network conditions, train separate policies on low-latency (<50ms) and high-latency (50-200ms) subsets, then compare performance. If the low-latency subset policy significantly outperforms the combined-data policy, the high-latency data is degrading training and should be excluded or heavily down-weighted. In SVRC experience, datasets where more than 30% of episodes have RTT above 100ms benefit from latency-based filtering.
Teleoperation Session Management for Production Data Collection
Managing multiple remote operators across time zones requires session management infrastructure beyond basic WebRTC connectivity.
- Pre-session network qualification. Before each collection session, run a 30-second network quality test that measures RTT, jitter, packet loss, and available bandwidth. If the network does not meet minimum thresholds (RTT < 200ms, jitter IQR < 30ms, bandwidth > 5 Mbps), postpone the session. Collecting data on a degraded network wastes operator time and produces low-quality demonstrations that may need to be discarded. The 30-second investment prevents hours of wasted collection effort.
- Operator scheduling. Assign operators to robot stations based on latency tier. Operators with sub-50ms RTT (same region as robot) get priority for precision tasks (L3/L4). Operators with 100-200ms RTT are assigned to simple pick-place tasks (L1/L2).
- Automatic session quality monitoring. Track RTT, jitter, packet loss, frame rate, and operator throughput in real-time. Alert the session supervisor when quality metrics drop below thresholds (RTT > 200ms for > 30 seconds, jitter IQR > 50ms, frame rate < 20fps).
- Session recording integrity. Record the complete session state (camera feeds, joint states, F/T data, operator commands, network metrics) in a synchronized HDF5 file. Include network quality metrics as a separate observation channel so that downstream training pipelines can use latency information for data weighting.
- Operator fatigue management. Enforce 45-minute maximum continuous teleoperation sessions with mandatory 15-minute breaks. Beyond 45 minutes, demo quality degrades 15-25% due to fatigue regardless of network quality. See our fatigue and ergonomics study for details.
Adaptive Bitrate Streaming for Robot Video Feeds
Unlike entertainment video streaming where buffering is acceptable, robot teleoperation video must maintain consistent low latency even when bandwidth fluctuates. SVRC's adaptive bitrate system adjusts video quality in real-time to maintain a latency ceiling.
| Bandwidth Available | Resolution | Frame Rate | Bitrate | Operator Impact |
|---|---|---|---|---|
| > 20 Mbps | 1280x720 | 30 fps | 4-6 Mbps | Full quality; no perceptible degradation |
| 10-20 Mbps | 960x540 | 30 fps | 2-3 Mbps | Slightly reduced clarity; no throughput impact |
| 5-10 Mbps | 640x480 | 30 fps | 1-2 Mbps | Noticeable compression; 5-10% throughput reduction |
| 2-5 Mbps | 640x480 | 15 fps | 0.5-1 Mbps | Choppy motion; 15-25% throughput reduction; L1/L2 tasks only |
| < 2 Mbps | 480x360 | 10 fps | 0.3-0.5 Mbps | Significant degradation; pause session if sustained > 60 seconds |
The key principle: always sacrifice resolution before frame rate. Operators can perform well at 640x480 resolution but struggle below 15 fps because the visual feedback lag makes precise positioning difficult. The video encoder on the robot side (VP8 hardware encode on Intel/NVIDIA, or VP9 software encode) must respond to bandwidth estimates within 200ms to prevent frame queuing, which adds latency worse than any resolution reduction.
Measuring and Reporting Teleoperation Data Quality
Every teleoperation data collection session should produce a quality report alongside the demonstration data. These metrics allow downstream consumers (ML engineers training policies) to make informed decisions about data filtering and weighting.
- Network quality metrics per episode: Mean RTT, P95 RTT, jitter IQR, packet loss percentage, mean bandwidth utilization. Store these in the HDF5 episode metadata.
- Operator performance metrics: Task completion time, trajectory smoothness (mean jerk), number of corrections/hesitations (detected from velocity zero-crossings), idle time percentage. Operators consistently below the 25th percentile of throughput should receive additional training or reassignment.
- Hardware status at collection time: Camera frame rate stability (any dropped frames), joint encoder read rate, F/T sensor noise level. Hardware degradation is often gradual and only detected through systematic monitoring.
- Aggregate session report: Total episodes collected, success rate, mean quality score, network quality tier distribution, operator ID and hours worked. This report should be generated automatically at session end and stored with the dataset.
SVRC's data platform generates these reports automatically for every collection session and makes them available through the dataset management interface. The reports feed directly into the data weighting pipeline for training, ensuring that high-quality demonstrations have proportionally more influence on the learned policy.
Predictive Display: Reducing Perceived Latency
Predictive display techniques render a predicted future state of the robot based on the operator's commands, overlaid on the delayed camera feed. This reduces the perceived latency by the prediction horizon (typically 50-150ms), making teleoperation feel more responsive even when the actual network latency is high.
The simplest effective predictor is a kinematic forward model: given the current commanded joint velocities, predict where the robot will be 100ms in the future and render a ghost overlay of the predicted end-effector position on the video feed. This approach requires no learned model -- just the robot's forward kinematics (available from the URDF) and the assumption that commanded velocities will be achieved. For the OpenArm 101, this assumption holds within 3mm position error for predictions up to 150ms ahead at typical teleoperation speeds.
More advanced predictors use a learned dynamics model to predict the full visual scene evolution, but these add compute latency that partially offsets the perceptual benefit. The kinematic ghost overlay is the pragmatic choice for production data collection: it is fast (< 1ms compute), reliable, and provides the operator with the critical information (predicted end-effector position) needed to plan their next move.
Connection Requirements
- Symmetric fiber (home or office): RTT typically <30ms domestic, <80ms intercontinental. Most reliable for sustained teleoperation sessions. Recommend as minimum for operators doing 4+ hour sessions.
- 5G mmWave: 10–20ms RTT, 100Mbps+ bandwidth. Excellent when available. Coverage is still limited to dense urban areas; not reliable for mobile operator setups.
- 4G LTE: 30–80ms RTT typically, variable. Viable for medium-precision tasks. Jitter is the main problem — implement adaptive jitter buffer on the control command receive side.
- Starlink: 25-60ms RTT, 50-200 Mbps download. RTT varies with satellite position and experiences periodic latency spikes (200-500ms) during satellite handoff (every 15-30 minutes). Viable for L1/L2 tasks with jitter buffering but not recommended for precision tasks. The periodic spikes require the jitter buffer to handle 5-10x normal variation, which adds latency to compensate. SVRC has tested Starlink for remote data collection from rural sites and found it acceptable for simple pick-place tasks with appropriately configured jitter buffers.
- Home WiFi (shared): 20–100ms variable, with potential 200–500ms spikes during household traffic. Require operators to connect via Ethernet for production data collection sessions.
The SVRC teleop platform measures RTT at session start, routes operators to appropriate task queues automatically, and uses WebRTC VP8 with hardware encode for all video streams.
Emergency Stop and Safety Over Network
Remote teleoperation introduces a unique safety challenge: the operator is physically separated from the robot and cannot press a physical e-stop button. The network-based safety architecture must provide equivalent protection.
- Heartbeat monitoring. The operator client sends a heartbeat signal at 10 Hz. If the robot controller does not receive a heartbeat for 500ms (5 missed beats), it triggers an automatic velocity ramp-down to zero over 200ms, followed by engagement of joint brakes. This ensures the robot stops safely even if the network connection drops entirely.
- Dual-path e-stop. Implement a software e-stop button in the operator UI that sends a stop command via both the primary WebRTC DataChannel and a redundant WebSocket connection. The robot controller triggers emergency stop if either path delivers the command. This dual-path approach survives single-channel failures.
- Local safety monitor. A separate process on the robot controller monitors joint velocities, forces, and workspace boundaries independently of the network connection. If the robot approaches a workspace boundary (defined by a configurable safety volume) or exceeds force thresholds, the local monitor triggers an e-stop regardless of operator commands. This provides defense against both network failures and malicious or erroneous operator commands.
- Session-start safety check. Before enabling teleoperation control, verify: cameras are streaming, joint encoders are reporting, e-stop circuit is responsive (trigger and release test), and workspace is clear (no obstructions detected by safety cameras). This checklist runs automatically at session start and blocks teleoperation until all checks pass.
These safety measures add 2-5ms of latency (heartbeat check, boundary monitoring) but are non-negotiable for production teleoperation. SVRC's platform implements all four layers and logs every safety event for post-session review.