All posts
securitydeveloper-experiencearchitectureapi

Security and UX Are Not Opposites. Badly Designed Security Is.

Security improvements that stick are the ones that reduce cognitive load for normal users while narrowing attack surface

The most common way security improvements get rolled back is not because attackers found a bypass.

It's because developers disabled them.

A security control that requires three extra steps to authenticate. A validation that rejects valid inputs because the regex was too strict. A permission model that makes the common case require administrator intervention. These controls are real security — they do narrow the attack surface. But they also generate friction. Support tickets. Complaints. Workarounds. And eventually, someone with enough authority and frustration disables the control, citing developer productivity, and the attack surface is back.

Security improvements that stick are the ones that feel like product improvements.

The test: does this security change make the experience better for someone doing something legitimate? If it does, it will stay. If it makes the experience worse for legitimate users, it will eventually be disabled — either formally or through workarounds that leave you thinking the control is active when it isn't.

This post is about designing security improvements that developers don't want to turn off.

I – The False Binary

The "security vs UX" framing is wrong, and accepting it leads to bad design.

When security controls create friction for legitimate users, that's not a security/UX tradeoff. That's bad security design. Good security design reduces friction for the common case while making the attack surface smaller.

Consider how HTTPS changed the web. Before HTTPS, sensitive operations required remembering to go to the HTTPS version of a site. After HTTPS became the default, it got easier (no decision needed) while becoming more secure (no downgrade attacks). Security improved. UX improved. Not a tradeoff — better design.

The same principle applies to API security:

  • Scoped API keys are more secure than global keys, and easier to reason about ("this key can only touch this project")
  • Clear error messages are more secure than vague ones (developers can fix auth issues without logging every attempted configuration), and provide better UX
  • A /me endpoint that shows what the current key can do is more secure (developers can verify scope before integrating) and faster to develop against

None of these are tradeoffs. They're better design.

II – The Design Principle

Strict boundaries. Ergonomic defaults.

Strict boundaries mean the system does exactly what it should and nothing more. A key scoped to project A cannot access project B. A read-only key cannot write. An expired key cannot authenticate. No exceptions. No escalation paths that bypass the boundary.

Ergonomic defaults mean the common case requires no extra configuration. A developer who sets their API key in the environment gets the right project context automatically. A key with no explicit capability restrictions has sensible read/write defaults, not "no permissions." The zero-configuration experience works.

The failure mode of "strict but not ergonomic": developers configure wide permissions because scoped permissions are too annoying to set up. A key gets allowed_projects: ["*"] because scoping it to the right project requires understanding a permission model nobody explained.

The failure mode of "ergonomic but not strict": the defaults are so permissive that there's effectively no security boundary. Everything is allowed. The configuration is easy because there's nothing to configure.

Both are bad. The goal is strict defaults that are ergonomic for the common case.

III – Default Project Context

The most common authorization decision in a multi-project API: which project does this operation apply to?

A poorly designed default: error if no project is specified. The developer must always include project_id in every request. They forget once. They get an error. They add a global project_id override to their code. The override is wrong half the time.

A well-designed default: the API key carries project scope. A single-project key automatically applies all operations to its project. No project_id required. Zero-config for the common case.

# Bad: developer must always specify project
client.memory.remember("This is a note", project_id="proj_abc123")

# Good: single-project key makes project_id optional
client.memory.remember("This is a note")
# project derived from API key scope automatically

The security property: the default project is derived from the key's scope, not from a default in the application code. If the key can only access project A, the default is project A. The developer cannot accidentally operate in a different project by omitting the parameter.

For keys with multi-project scope, require explicit project specification. The default-from-scope optimization applies only when the scope is unambiguous (one project).

IV – Capability Discovery

Developers integrating with your API need to know what their key can do. If they have to guess by trying operations and seeing which ones return 403, they will get frustrated. They will also not trust the permission model — if they don't know what they have access to, they can't trust that the access control is working correctly.

A /me endpoint that shows current identity and capabilities:

GET /me

{
  "key_id": "key_abc123",
  "key_prefix": "cvk_live_xyz",
  "project_id": "proj_def456",
  "project_name": "My Project",
  "capabilities": ["memory:read", "memory:write", "context:read"],
  "scope": "project",
  "expires_at": null,
  "rate_limits": {
    "daily_limit": 10000,
    "daily_used": 347,
    "burst_limit": 100
  }
}

This endpoint:

  • Tells the developer who they are (key identity and project)
  • Shows what they can do (capabilities)
  • Shows what they can't do (by omission — if memory:delete isn't in the list, they can't delete)
  • Shows their quota status (how many requests remain today)

This is good security. The developer verifies their setup against the declared capabilities before building. They catch scope misconfiguration at setup time rather than at runtime. They don't need to poke at endpoints to discover what's allowed.

It's also good UX. Zero ambiguity about what the key does.

V – Backward-Compatible Hardening Rollout

The most dangerous way to ship a security improvement: deploy it to all users simultaneously with no migration period.

At 9am, behavior X is allowed. At 9:01am, behavior X is rejected with a 403. Every integration that relies on behavior X is now broken. Support tickets flood in. An engineer rolls back the security change because the alternative is an extended production incident.

The safe rollout sequence:

Phase 1: Observe. Deploy the new validation logic, but in log-only mode. Log every request that would fail under the new rules. Don't reject anything. Run for 1-2 weeks.

Phase 2: Warn. Return a warning in response headers for requests that would fail. Warning: 299 - "This request uses deprecated behavior that will fail after [date]". Integrate with your developer dashboard to surface these warnings.

Phase 3: Enforce for new keys. New API keys issued after a specific date are subject to the new rules. Existing keys have until the sunset date. This gives existing integrations time to migrate while preventing the problem from getting worse.

Phase 4: Enforce for all keys. After the sunset date, enforce the new rules for all keys. Most integrations have already migrated. The remaining ones get clear error messages explaining what needs to change.

This rollout takes longer. It's worth it. The alternative is a rollback that removes the security improvement entirely, and a future date when you have to do this all over again.

VI – Measuring UX Impact

You cannot know whether a security change caused UX regression unless you measure before and after.

Metrics to track through a hardening rollout:

  • 403 rate per endpoint. A spike after enforcement begins indicates integrations that weren't migrated. Investigate by key ID — who is getting the 403s?
  • Support ticket volume. Tag tickets related to the security change. A spike in tagged tickets is evidence of UX friction.
  • SDK download rate / API sign-up rate. A drop could indicate that the change is making the developer onboarding experience harder.
  • Time to first successful API call. This is a developer experience metric. If a hardening change increases the steps required to get a working key, it will show up here.

Define these metrics before the rollout, not after. Establish baselines. Set alert thresholds. Measure the rollout against the baselines.

A security improvement that doesn't regress these metrics is a security improvement that will stay.

VII – What Breaks First

Over-hardening that breaks integrations. A new validation rejects API calls that don't include an explicit project_id field. Every integration that relied on the default project context is now broken. The error message says "project_id is required" but doesn't explain how to get a project_id or that the key can provide one automatically. Fix: the error message links to documentation. The validation has a migration period. The documentation is updated before the validation is enforced.

Hidden behavioral changes without migration notes. A security change alters the behavior of an existing endpoint without changing the endpoint's path or response schema. Integrations don't notice anything has changed until their audit assumptions no longer hold. Fix: every behavioral change appears in the changelog, marked explicitly as a behavioral change. No silent changes.

Security controls disabled due to developer friction. A capability model is deployed that requires explicit capability grants for every operation. Setting up a new project now requires three separate permission grants that previously happened automatically. Developer setup time doubles. A senior engineer requests that the capability grants be made automatic for new projects "temporarily." The temporary exception becomes permanent. The capability model now has a bypass for all new projects. Fix: the ergonomic default should have been built into the initial design. Retroactively adding it without removing the bypass is harder.

Hardening Rollout Checklist

  • Behavioral change logged in observe mode for 2+ weeks before enforcement
  • Warning headers deployed before enforcement
  • Sunset date communicated in docs, dashboard, and email
  • Error messages include migration guidance and documentation links
  • Metrics defined and baselines established before rollout
  • New-key enforcement before all-key enforcement
  • Rollback plan defined (revert to warn mode, not full rollback)

Compatibility Contract Checklist

  • New validation is additive (rejects something that was never supported) OR has a migration period
  • Existing integrations tested against new behavior before enforcement
  • SDK updated to support new behavior before enforcement
  • Documentation updated to describe new behavior as the default

Security should reduce the cognitive load for people doing the right thing. If it doesn't, you've built friction, not security. The two are not the same thing, no matter how good your intentions were.

0 comments

Join the conversation

Enjoyed this? Subscribe for more.

Get new essays on software architecture, AI systems, and engineering craft delivered to your inbox. No spam-ever.