All posts
securityarchitecturebackendapi

You Don't Have an Authentication Problem. You Have an Authorization Problem.

Why multi-tenant systems fail at authorization boundaries, and how to fix it

Most multi-tenant systems fail in the same place.

Not at login. Not at token validation. Not in the crypto.

They fail the moment a valid, authenticated user touches a resource that belongs to someone else — and the system lets them through.

Authentication tells you who someone is. Authorization tells you what they're allowed to do. Every team gets authentication right eventually. Authorization is where the bugs live. It's where the incidents happen. It's where the scary ones happen — the ones that expose real customer data.

This post is about why that failure is so common, how to design around it, and what a production-grade multi-tenant authorization system actually looks like.

I – The Failure Pattern Nobody Talks About

Here's how most authorization bugs get introduced.

A system starts with one tenant. There are no per-tenant boundaries. Everything is global. The code is simple, and it works.

Then there are two tenants. A project field gets added to the database. A user-context object starts carrying a userId. Someone writes: WHERE user_id = $userId and feels like the problem is solved.

It isn't.

Fast forward six months. There are a hundred tenants. A new engineer joins and writes an endpoint that fetches a record by ID. The ID comes from a URL parameter. The code looks up the record, returns it if found, and the engineer ships it.

That endpoint has no ownership check. Any authenticated user can pass any ID and read any record in the database.

This is not a hypothetical. This is how it works in real codebases. Every single time.

The failure is not malice. It's the natural drift that happens when authorization is handled locally, inconsistently, and implicitly. Developers make decisions about who can do what in the handler, the service, the middleware — everywhere and nowhere. The system has no single opinion.

II – Three Models, One Right Default

There are three common access control models for multi-tenant APIs.

Row-level security. Every query includes a tenancy predicate. WHERE project_id = $projectId AND id = $resourceId. Ownership is enforced at the database level. This is the right primitive for read queries. But it's not a complete authorization model — it doesn't help with create, update, or cross-cutting operations.

ACL-based project scope. A project acts as the authorization boundary. API keys carry a project ID. Operations are scoped to that project. Cross-project reads require explicit permission. This is the right default for B2B APIs and is what most well-designed platforms use.

Policy engines. Full attribute-based access control: who, what, how, under what conditions. Powerful for complex permission models. Expensive to build and reason about. Avoid until you actually need it.

For most B2B API platforms, ACL-based project scope is the right default. Pick it early. It forces the right discipline.

III – The Predicate Problem

The most common wrong solution is role checking at the handler level.

if user.role == "admin" {
  return record
}
return forbidden

This is not an authorization model. It's a vibes check.

The problem is that "admin" means different things in different contexts. A project admin can't access another project's data. A platform superadmin can. A CI integration key should have no admin capability at all. When "admin" is a single role without scope semantics, you will make mistakes.

The right approach is a centralized access predicate. A single function, called everywhere, that answers one question: can this identity access this resource?

func CanAccessProject(identity Identity, projectID string) (bool, error)

This function enforces:

  • Does the identity own or have explicit access to this project?
  • Is the identity's scope (API key allowed_projects list) compatible with this project?
  • Is the identity a platform superadmin with explicit superadmin mode active?

Every handler that touches project-scoped data calls this function. No exceptions. No inline role checks. No if isAdmin scattered across handlers.

The function is tested. Extensively. With table-driven tests that cover every identity type against every access scenario.

IV – Handler-Level Defense in Depth

Centralized predicates are the first layer. Defense in depth means you don't stop there.

Every handler has a layered authorization model:

  1. Authentication middleware validates the token and extracts identity
  2. Project context middleware resolves the project from the URL or API key scope
  3. The handler calls CanAccessProject with the resolved project
  4. The query uses row-level tenancy predicates as a final backstop

If any layer fails, the request is rejected. No exceptions. No fallthrough.

The pattern looks like this:

router.Use(AuthMiddleware)         // who are you?
router.Use(ProjectContextMiddleware) // what project are you in?

// In the handler:
if !CanAccessProject(ctx.Identity, ctx.ProjectID) {
  return 403
}

// In the query:
WHERE project_id = $projectID AND id = $resourceID

Even if CanAccessProject had a bug, the query predicate would catch the case where identity A accessed identity B's records. Defense in depth means a single bug doesn't become a breach.

V – The Dangerous Defaults

Three patterns will burn you.

Default project context. Some systems automatically select a "default project" for users who don't specify one. This is convenient and dangerous. If the selection logic is wrong — even once, even in one edge case — a user lands in the wrong project context. Every action they take is scoped to the wrong project. This isn't a minor bug. It's a data isolation failure.

Default project context should be deterministic and auditable. Always derive it from the API key's allowed_projects list or require the caller to specify it explicitly.

ID-based endpoints without ownership checks. An endpoint like GET /records/:id that fetches by primary key without verifying ownership is a cross-tenant read waiting to happen. Every endpoint that accepts an external ID must verify that the ID belongs to the identity's project before returning anything.

Ambiguous role escalation. When a user has role "admin" and a new feature checks if role == "admin", that feature is now available to every admin in your system, including ones who shouldn't have access to it. Roles without explicit scope semantics compound over time into a privilege escalation surface.

VI – Testing for Cross-Tenant Abuse

The test that most teams never write is the one that matters most.

func TestCrossTenantRead(t *testing.T) {
  tenantA := createTenant()
  tenantB := createTenant()
  record := createRecord(tenantA.ProjectID)

  response := makeRequest(tenantB.APIKey, "GET /records/"+record.ID)

  assert.Equal(t, 403, response.StatusCode)
  assert.Equal(t, 0, auditLog.CrossTenantAttempts())
}

This test should exist for every resource type. Every endpoint. Every read and write path.

If writing this test feels tedious, that's a signal: your authorization model is too distributed. It should be possible to write one test that covers all resource types because the authorization logic is centralized.

The regression matrix should include: unauthenticated access, authenticated but wrong project, authenticated and right project, admin from different project, platform superadmin, expired token, revoked API key. That's your authorization test surface.

VII – The Superadmin Migration

Most systems start with an "admin can do anything" model. At some point, you need to separate project admins from platform superadmins.

The migration pattern that works:

  1. Add a superadmin boolean to the identity model, separate from the role field
  2. Change CanAccessProject to check superadmin explicitly, not role
  3. Add an audit log event whenever superadmin access is used
  4. Require superadmin mode to be explicitly activated (not always-on)
  5. Migrate existing role checks to call CanAccessProject instead

Do not rename roles. Do not change the existing role system. Add the explicit superadmin gate alongside it, then migrate handler by handler. Trying to do it all at once will miss something.

Authorization Decision Table

Identity Type Own Project Other Project Superadmin Mode
API Key (scoped) Allow Deny N/A
API Key (global) Allow Deny N/A
User (project member) Allow Deny N/A
User (platform admin) Allow Deny Allow if active
Unauthenticated Deny Deny N/A

Handler Authorization Checklist

  • Authentication middleware runs before any handler
  • Project ID is resolved from API key scope, not request body
  • CanAccessProject is called in every handler that reads or writes scoped data
  • No inline role checks exist outside of CanAccessProject
  • Row-level tenancy predicates are present in all queries
  • Cross-tenant access attempts are logged and alerted
  • Superadmin access is audited separately from normal access

What Breaks First

Authorization bugs don't announce themselves. They show up as customer support tickets ("I can see records I didn't create"), as security audits ("we found an IDOR in your /records/:id endpoint"), or as incidents you discover after the fact in logs.

The pattern is always the same: authorization was handled locally, implicitly, or inconsistently. One engineer made a shortcut. Another engineer copied the pattern. The surface grew.

Centralize the predicate. Enforce it at every layer. Test the cases that feel paranoid. The ones that feel paranoid are the ones that matter.

0 comments

Join the conversation

Enjoyed this? Subscribe for more.

Get new essays on software architecture, AI systems, and engineering craft delivered to your inbox. No spam-ever.