Design systems work is the area of design practice most immediately and materially affected by AI — and also the area where the distinction between what AI can do and what it can’t matters most.
The confusion comes from conflating two very different things: generating design system artifacts, and making design system decisions. AI can do the first at speed. The second is still entirely human work. Conflating them produces systems that look comprehensive but don’t hold.
Generating component variants is fast and mostly reliable. If you have an established pattern — a button with five states, a form field with three error conditions — AI can generate all the variants, apply consistent spacing and color tokens, and produce a first draft that’s 80% of the way to production. The remaining 20% is edge case handling, accessibility review, and the judgment call about whether the pattern is actually correct. But 80% in twenty minutes is a real efficiency gain.
Documentation follows the same pattern. AI writes serviceable first-pass documentation from component specifications. The output needs editing — it tends toward generic descriptions that don’t explain the why behind a decision — but it’s faster than starting from a blank page, and the editing process often surfaces gaps in the specification itself.
Token naming and organisation is another genuine win. Given a set of design values, AI can suggest systematic naming conventions, identify inconsistencies, and propose hierarchies. This is tedious, rule-bound work that doesn’t benefit from human creativity. AI does it reliably.
Deciding what to standardise is not automatable. The hardest question in design systems work isn’t “how do I build this component” — it’s “should this be a component at all.” That decision requires understanding which problems recur, which variations are genuinely distinct, which patterns are load-bearing, and which ones reflect the accumulated drift of a product built by too many people without enough shared agreements. AI has no knowledge of your product’s history, your team’s working patterns, or the dynamics that made certain decisions happen the way they did.
Defining behavioral standards is also beyond what AI can reliably do. An AI can generate a specification for how a dropdown should look. It cannot generate the interaction contract that defines how filtering persists across surfaces, how drill interactions escalate, or what the empty state means in the context of the specific workflow this component lives inside. Those decisions require understanding the product deeply enough to have opinions about it.
A design system that was generated quickly but not decided carefully produces a specific failure: visual consistency without behavioral coherence. The components look like they belong together. The interactions don’t. Users experience the surface consistency as a product that looks polished but feels unreliable — because the visual layer was standardised and the behavioral layer wasn’t.
This is the Frankenstein problem — not visible components drifting, but invisible behavioral decisions made independently across modules. AI would have made the visual drift faster to fix and entirely missed the underlying problem.
The lesson is not to avoid AI in design systems work. It’s to be clear about what you’re using it for. Generating artifacts: yes. Making the decisions that determine whether those artifacts are correct: no.
AI can build the system faster. It cannot tell you what the system should be.
References