Glossary

Short, precise definitions.

Every term you’ll meet in posts, papers, and simulations on this site. Where an equation clarifies things, there’s an equation. Where one word would be enough, there’s one word.

29 of 29 terms

Classical control

Transfer function#

The Laplace-domain ratio of a system's output to its input, assuming zero initial conditions. The poles (roots of the denominator) determine stability; the zeros shape the transient response. Nearly every classical-control tool — Bode plots, root locus, Nyquist criteria — operates on .

Phase margin#

Measured at the frequency where (the gain crossover), the phase margin is . A rule of thumb: aim for 45°–60°. Below 30° the system rings; negative means it's already unstable.

Gain margin#

Measured at the frequency where the open-loop phase is , it's the reciprocal of in dB. Paired with phase margin, it gives a quick read on how close a controller is to the edge.

Bode plot#

The standard classical-control chart. Reads phase margin, gain margin, bandwidth, and roll-off rate at a glance. Pairs naturally with PID tuning and loop shaping.

Nyquist criterion#

For a closed-loop system to be stable, the Nyquist plot of must encircle the point a number of times equal to the number of right-half-plane open-loop poles. Stronger than Bode margins because it works for unstable open-loop systems.

PID controller#

The workhorse controller: . Proportional for response, integral to kill steady-state error, derivative for damping. Still runs more of the world's control loops than anything else.

State-space model#

A system described by its internal state rather than input/output relations. Makes multi-input multi-output systems tractable and is the foundation for LQR, Kalman filtering, and MPC.

Optimal & predictive control

LQR (Linear-Quadratic Regulator)#

Minimizes for . The solution is a linear state feedback , with computed from the algebraic Riccati equation. Guaranteed infinite gain margin and phase margin — a free lunch as far as classical-control robustness goes.

Receding horizon control#

The core idea behind model predictive control. At each timestep you optimize a finite-horizon trajectory, execute only the first control, and resolve the full problem at the next step with updated state. Trades a harder per-step computation for the ability to handle constraints and disturbances online.

MPC (Model Predictive Control)#

Formulated as: subject to and constraints on . Solved with quadratic programming (linear MPC) or nonlinear solvers. Dominates in HVAC, process control, autonomous vehicles, and increasingly legged robotics.

Lyapunov function#

A function (except at equilibrium) whose time derivative along trajectories is . If such a function exists, the system is stable; if , it's asymptotically stable. Finding a valid Lyapunov function is the hard part — increasingly, SOS programming and neural networks are used to search for one.

Kalman filter#

Given noisy measurements and a linear model, maintains a belief over the state that is provably optimal in the MMSE sense. Two steps: predict ($\hat x^-, P^-$) and update (fold in the new measurement via the Kalman gain $K$). Extensions (EKF, UKF, particle filter) handle nonlinearity.

Reinforcement learning

MDP (Markov Decision Process)#

A tuple : states, actions, transition kernel , reward function , and discount factor . Everything in RL — policies, value functions, Bellman equations — is defined with respect to an MDP.

Bellman equation#

For a policy : . The optimal form swaps the expectation over for a max over actions. Every RL algorithm is some way of exploiting this recursion.

Policy gradient#

. Lets you optimize a stochastic policy by following the gradient of expected reward. Variance reduction (baselines, GAE, trust regions) is what makes it work in practice.

Advantage#

. Subtracting the state value from the action value removes state-dependent variance from policy-gradient estimators without adding bias — the reason every modern policy-gradient method (A2C, PPO, GAE) uses advantage rather than raw returns.

Q-learning#

Updates an estimate of the optimal action-value function: . With tabular states and enough exploration, it provably converges. The deep variant (DQN) replaces the table with a neural net.

Discount factor#

The in the discounted return . Low makes the agent myopic; high makes credit assignment hard over long horizons. Often treated as a hyperparameter even when the true problem has a natural horizon.

Actor-critic#

The critic estimates or $Q$; the actor uses those estimates as a baseline or target for its own updates. Modern deep-RL algorithms (A2C, PPO, SAC, TD3) are all actor-critic variants — they get the low-variance gradients of policy gradient plus the bootstrapping of value methods.

Robotics & physical AI

Sim-to-real#

The dominant recipe for robot learning. Key tricks: domain randomization (randomize mass, friction, actuator lag, sensor noise), privileged training (give the expert extra state, then distill to a student that only sees what the real robot sees), and system identification to narrow the gap. Fails loudest when the sim omits something the real world cares about.

Domain randomization#

Train across a distribution of physics parameters (friction, mass, motor gains) and visual properties (lighting, textures) wide enough that the real world is effectively another draw from the same distribution. Cheap to implement, surprisingly effective, and the reason so many sim-to-real stories work on legged robots.

Imitation learning#

Simplest form: behavior cloning, treat it as supervised learning on pairs. Classic failure mode: compounding errors, where small deviations take the agent off the demo distribution and everything falls apart (DAgger and its successors address this). Modern variants use diffusion, transformers, or VLAs to model the demonstrator distribution.

VLA (Vision-Language-Action model)#

Descendants of vision-language models, finetuned to output action tokens. Examples: RT-2, OpenVLA, . Promise: a single policy that can follow arbitrary language commands across many embodiments. Reality as of 2026: real progress on benchmarks, still brittle outside training distributions, and expensive to run at control rates.

Diffusion policy#

Predicts a short horizon of future actions by denoising random noise conditioned on recent observations. Handles multimodal demonstrations naturally — a problem where regression-based behavior cloning tends to average across modes and fail.

World model#

trained to reconstruct observed transitions. Planning inside a world model (Dreamer-style) can be far more sample-efficient than model-free RL, at the cost of errors compounding over long rollouts.

Math foundations

Jacobian#

. In robotics, the manipulator Jacobian maps joint velocities to end-effector velocities. In control, the Jacobian linearizes a nonlinear system around an operating point.

Linearization#

For at equilibrium , the linearization is with and . Lets you apply every tool from linear control — local to the operating point only.

Controllability#

For a linear system , check whether the controllability matrix has full row rank. A practical warning flag: if controllability only holds near the edge of numerical rank, your real system probably has modes you can barely influence.

Observability#

The dual of controllability: the observability matrix must have full column rank. Failure means some state directions are invisible to your sensors — a state estimator can't recover them no matter how clever the filter.