Beyond the Login: Why API Security Now Lives in Behavior, Not Authentication

There's a moment that keeps recurring in breach post-mortems over the last two years, and it's never the moment anyone expects. It isn't a cracked password. It isn't a phished MFA code. It's a token — small, valid, completely unremarkable — doing exactly what it was authorized to do, for exactly the wrong party, for weeks before anyone noticed.

That's not a hypothetical. It's the mechanism behind some of the largest SaaS breaches of 2025 and 2026, and it points to a gap that most security architectures still haven't closed: the difference between verifying who someone is and watching what they do afterward. Authentication answers the first question. Almost nothing in a typical stack answers the second, continuously, in real time. This piece is about why that gap exists, how it's being exploited at scale, and what an actual engineering approach to closing it looks like — not as theory, but as code you could run today.

The breach that exposed the gap

On August 26, 2025, Google's Threat Intelligence Group disclosed a campaign, attributed to a group it tracks as UNC6395, built entirely around stolen OAuth tokens. The attackers had compromised Drift, a chatbot integration owned by Salesloft, and harvested the tokens Drift used to connect to customer Salesforce instances. From there, no further "hacking" was required. The tokens were valid. The requests they generated looked, to every system logging them, like ordinary integration traffic. By the time investigators had mapped the damage, reporting put the number of affected organizations at roughly 760, including Google, Cloudflare, Cisco, Qantas, and TransUnion among the named victims.

The part that should worry engineers more than the initial breach is what happened next. In November, a related campaign used credentials harvested from Salesloft/Drift support case data to obtain Gainsight OAuth tokens, then issued refresh tokens against as many as 285 connected Salesforce instances. One compromised integration seeded the next almost automatically, because the trust relationships between SaaS platforms don't expire just because one link in the chain turned out to be hostile. A similar pattern surfaced again in early 2026, when an OAuth-token exfiltration tied to a Vercel platform breach gave attackers a multi-week foothold in a customer's AWS environment before anyone caught it.

Three incidents, three vendors, one mechanism: a credential nobody was actively watching, behaving consistently enough to avoid suspicion, for long enough to matter.

Why "authenticated" stopped meaning "safe"

Here's the statistic that should reframe how most teams think about their threat model. Salt Security's 2025 research found that 95 percent of API attacks originate from authenticated sources — not brute-force attempts, not credential stuffing, but valid sessions and accepted tokens doing damage from the inside. Salt's broader survey work put the number of organizations encountering some API security issue in the prior twelve months at 99 percent.

That statistic exposes something uncomfortable about how most security budgets get allocated. Login hardening — MFA, adaptive authentication, password policy — addresses a single moment in a session's lifetime: the moment it begins. Everything that happens afterward, often for the entire lifetime of a long-lived OAuth token, is implicitly trusted by default. Wallarm's 2026 API ThreatStats report, built from an analysis of 67,058 published vulnerabilities, found that API-related flaws accounted for 43 percent of all additions to CISA's Known Exploited Vulnerabilities catalog in 2025, and that 97 percent of those API vulnerabilities could be exploited with a single request. Akamai's 2026 State of the Internet report found 87 percent of organizations had experienced an API-related incident that year, with the average daily volume of API attacks per organization climbing 113 percent year over year.

None of this required exotic tradecraft. It required a system willing to ask "is this credential valid?" and nothing else.

Why OAuth, specifically, makes this worse

OAuth solved a real problem: it eliminated the practice of services sharing raw passwords with each other. But it introduced a more subtle one. A password is something a human remembers exists. A token is something that lives in a config file, an environment variable, or a vendor's infrastructure, quietly outliving the business decision that created it. Nobody re-approves a Salesforce integration every quarter. Nobody asks, eighteen months after a marketing tool was connected to a CRM, whether it still needs the scopes it was granted on day one.

The result is a slow accumulation of standing trust that organizations can't fully inventory. The Cloudflare/Okta incident from late 2023 is the case study security teams still reference for a reason: Cloudflare rotated roughly 5,000 credentials in response to an identity-provider breach and was still compromised, because a single forgotten API token slipped through the rotation. If an organization with Cloudflare's operational maturity can miss one token out of five thousand, "we'll just rotate everything" isn't a control — it's a hope.

From identity checks to behavioral checks

This is the conceptual pivot that actually matters, and it's worth stating precisely, because "behavioral security" gets thrown around vaguely enough to mean almost nothing. A behavioral approach doesn't replace authentication. It adds a second, continuous question on top of it: given everything this credential has done recently, does its current request still make sense?

That requires tracking state that authentication systems normally discard the moment a request is approved — request velocity per credential, the diversity of endpoints a token touches, deviations from a token's own historical baseline, and signals like a single token suddenly originating from multiple geographies in a short window, which is a strong indicator of token replay rather than legitimate use. Crucially, the right unit of analysis isn't the IP address — attackers using stolen OAuth tokens rarely share infrastructure with the legitimate integration, so IP-based throttling, the thing most rate limiters actually do, misses the attack entirely. The right unit is the credential itself: the token, the API key, the service account.

Engineering Appendix: A minimal behavioral risk engine

What follows is a deliberately small system — not production-grade, but structurally honest about what a real one needs. It tracks behavior per credential rather than per IP, decays old behavior over time so the baseline adapts, and scores risk as a composite signal rather than a single rule tripping a single alert.

Step 1 — Define the telemetry unit

Everything downstream depends on capturing the right fields at the gateway or middleware layer. Most teams already log a subset of this; the gap is usually token_id and scope_used, which is exactly the data the OAuth-token breaches above would have surfaced immediately if it had been monitored.

from dataclasses import dataclass
from datetime import datetime

@dataclass
class ApiEvent:
    timestamp: datetime
    token_id: str        # the credential, not the IP — this is the key design choice
    source_ip: str
    endpoint: str
    scope_used: str
    status_code: int
    latency_ms: int
    payload_bytes: int

Step 2 — Maintain a rolling, decaying profile per credential

A naive tracker just sums up "bad events." That breaks in two ways: it never forgets old behavior, and a token that was compromised six months ago but is now clean stays flagged forever, training analysts to ignore the alert. An exponentially weighted profile fixes both problems — recent behavior matters more than old behavior, and the system adapts as legitimate usage patterns evolve.

import math
from collections import defaultdict, deque

class TokenProfile:
    """Tracks one credential's behavior with time-decayed signals."""

    def __init__(self, half_life_minutes: float = 30.0):
        self.half_life = half_life_minutes
        self.decayed_failures = 0.0
        self.decayed_volume = 0.0
        self.seen_ips = set()
        self.seen_endpoints = set()
        self.last_seen: datetime | None = None

    def _decay_factor(self, now: datetime) -> float:
        if self.last_seen is None:
            return 1.0
        elapsed_minutes = (now - self.last_seen).total_seconds() / 60.0
        return math.pow(0.5, elapsed_minutes / self.half_life)

    def update(self, event: ApiEvent):
        decay = self._decay_factor(event.timestamp)
        self.decayed_failures *= decay
        self.decayed_volume *= decay

        if event.status_code in (401, 403):
            self.decayed_failures += 1.0
        self.decayed_volume += event.payload_bytes

        self.seen_ips.add(event.source_ip)
        self.seen_endpoints.add(event.endpoint)
        self.last_seen = event.timestamp


class TokenTracker:
    def __init__(self):
        self.profiles: dict[str, TokenProfile] = defaultdict(TokenProfile)

    def record(self, event: ApiEvent) -> TokenProfile:
        profile = self.profiles[event.token_id]
        profile.update(event)
        return profile

Step 3 — Turn behavior into a composite risk score

A single rule ("too many 403s") is brittle and exactly what attackers learn to stay under. A composite score that blends several weak signals is harder to evade without changing the underlying behavior — which is the point.

class RiskEngine:
    def score(self, profile: TokenProfile, event: ApiEvent) -> float:
        score = 0.0

        # Sustained authentication-adjacent failures (credential is being tested)
        score += min(profile.decayed_failures * 1.5, 6.0)

        # IP churn: one credential, many networks, short window — classic replay signal
        if len(profile.seen_ips) >= 4:
            score += 3.0

        # Scope creep: a credential touching far more endpoints than its history suggests
        if len(profile.seen_endpoints) >= 8:
            score += 2.0

        # Abnormal payload volume relative to a typical single-request baseline
        if profile.decayed_volume > 5_000_000:  # ~5MB decayed volume is a lot for one token
            score += 2.5

        # Latency outliers can indicate automated scripting rather than a normal client
        if event.latency_ms > 2500:
            score += 0.5

        return round(score, 2)

Step 4 — Decide, don't just alert

The biggest practical mistake in homegrown anomaly detection is treating every flagged score as a binary block. Real systems need graduated responses — because a false positive that silently blocks a legitimate integration causes its own incident.

from enum import Enum

class Decision(Enum):
    ALLOW = "allow"
    STEP_UP_AUTH = "step_up_auth"     # require re-authentication, don't block outright
    THROTTLE = "throttle"
    BLOCK = "block"

class PolicyEngine:
    def decide(self, risk_score: float) -> Decision:
        if risk_score >= 9.0:
            return Decision.BLOCK
        if risk_score >= 6.0:
            return Decision.STEP_UP_AUTH
        if risk_score >= 3.5:
            return Decision.THROTTLE
        return Decision.ALLOW

Step 5 — Wire it together

class BehavioralGateway:
    def __init__(self):
        self.tracker = TokenTracker()
        self.risk_engine = RiskEngine()
        self.policy = PolicyEngine()

    def process(self, event: ApiEvent) -> tuple[float, Decision]:
        profile = self.tracker.record(event)
        risk = self.risk_engine.score(profile, event)
        decision = self.policy.decide(risk)
        return risk, decision

Running synthetic traffic through this — a mix of normal requests and a simulated token-replay pattern where one token_id suddenly shows up from several IPs with a string of 403s — produces exactly the behavior you'd want: clean traffic stays at ALLOW, the replay pattern climbs through THROTTLE into STEP_UP_AUTH within a handful of events, well before it would have reached any volume-based rate limit keyed on IP address, because IP-based limits never see a single token's cross-network pattern at all.

This is, deliberately, a toy. A production system would persist profiles in something like Redis instead of process memory, feed decisions into a SIEM rather than a print statement, and likely fold in a learned baseline per endpoint rather than the fixed thresholds above. But the architecture — decayed per-credential state, composite scoring, graduated policy — is structurally the same shape used by real fraud-detection and API-security platforms. The thresholds are the easy part. The hard part, and the part worth sitting with, is the question underneath all of it: what counts as "normal" for a credential that legitimately changes its behavior over time, and who decides when a deviation is suspicious rather than just new?

What this means for how companies should actually think about access

Closing this gap isn't primarily a tooling purchase — it's a change in what "secure" is allowed to mean inside an organization.

Token issuance needs an expiration model, not a "set and forget" default; a credential that's still valid eighteen months after the integration that requested it was last reviewed is a liability whether or not it's ever abused. Scope should be minimized at grant time and re-justified periodically, the same way access reviews already work for human accounts but almost never work for service accounts and OAuth grants. Monitoring needs to be keyed on the credential, not the network origin, because as the Salesloft/Drift and Vercel incidents both demonstrated, the attacker's infrastructure has no reason to resemble the legitimate integration's. And detection needs to be continuous and graduated rather than a one-time gate, because the entire point of the breaches described above is that nothing checked back in after the initial handshake.

The final shift

Every incident described in this piece passed every authentication check it encountered. That's not a failure of authentication — it's a description of what authentication was designed to do, which is verify identity at a single point in time. The breaches happened in the much longer window after that check passed, in the part of a session's life that most security architecture still treats as a settled question rather than an ongoing one.

The organizations that come out ahead over the next few years won't be the ones with the strongest login page. They'll be the ones that stopped asking only "is this credential valid" and started asking, continuously, "does this credential's behavior still make sense" — and built the engineering to answer that question before a token has had eighteen months to misbehave quietly.

Beyond the Login: Why API Security Now Lives in Behavior, Not Authentication

The breach that exposed the gap

Why "authenticated" stopped meaning "safe"

Why OAuth, specifically, makes this worse

From identity checks to behavioral checks

Engineering Appendix: A minimal behavioral risk engine

Step 1 — Define the telemetry unit

Step 2 — Maintain a rolling, decaying profile per credential

Step 3 — Turn behavior into a composite risk score

Step 4 — Decide, don't just alert

Step 5 — Wire it together

What this means for how companies should actually think about access

The final shift

Comments

More from this blog

Tokens, Not Passwords: How OAuth Became 2025's Biggest Attack Surface

Command Palette

The breach that exposed the gap

Why "authenticated" stopped meaning "safe"

Why OAuth, specifically, makes this worse

From identity checks to behavioral checks

Engineering Appendix: A minimal behavioral risk engine

Step 1 — Define the telemetry unit

Step 2 — Maintain a rolling, decaying profile per credential

Step 3 — Turn behavior into a composite risk score

Step 4 — Decide, don't just alert

Step 5 — Wire it together

What this means for how companies should actually think about access

The final shift

Comments

More from this blog