Engineering10 min read

Risk Scoring for Public-Facing AI: Eight Dimensions, Compound Scores, Hard Stops

Jeff Toffoli
Robotic arm with auto annotation label -- risk scoring as a structured, scorable, automatable system

"Is this AI deployment safe?" is the wrong question. Every deployment carries risk. The right question is: how much risk, on which dimensions, and what safety measures match it?

We built a risk scoring engine that answers exactly that — eight dimensions, scored 1 to 5, combined into a compound score with hard-stop thresholds for combinations that simply shouldn't ship. It runs inside our MCP write tools, so every configuration change to a public-facing AI agent gets evaluated before it takes effect.

This post walks through the framework. There's an interactive version at /risk-scoring where you can score example deployments yourself.

The Quallaa risk scoring presentation page, showing the eight dimensions and live deployment scoring
The interactive risk scoring presentation at /risk-scoring. Move the sliders for any deployment and watch the compound score change in real time.

The Eight Dimensions

The dimensions are designed to be independent — moving one shouldn't automatically move the others. They're also designed to be scorable by a non-expert reading the criteria, not just by someone who's read NIST AI RMF cover to cover. Each one is anchored in real frameworks (NIST, EU AI Act, OWASP Agentic Top 10, UC Berkeley Agentic Standards, NVIDIA Frontier Risk, AWS Agentic Security Scoping Matrix), then synthesized into criteria you can actually apply.

1. Autonomy. How much freedom does the agent have to act without human approval? Score 1: drafts responses, human reviews and sends every one. Score 5: initiates actions, spawns sub-tasks, runs continuously without a human in the loop. A plumber's text-back agent that responds, books, and follows up on its own — with the owner reviewing conversations daily — is autonomy 3.

2. Action capability. What can the agent actually do in the world? Score 1: read-only, no external effects. Score 5: irreversible financial, legal, or physical actions across multiple systems. Sending an email is higher than answering a question. Charging a card is higher than sending an email. Filing a permit is higher than charging a card.

3. Consequence severity. If the agent is wrong, how bad is it? Score 1: customer mildly annoyed. Score 5: physical harm, legal liability, or catastrophic financial loss. The same wrong answer about pricing has different severity for a yard care company versus a hospital triage line.

4. Reversibility. If the agent makes a mistake, can you undo it? Score 1: trivial to reverse, no residue. Score 5: permanent, with no remediation path. A booked appointment is highly reversible. A sent legal document is not.

5. Audience exposure. How many people see the agent's outputs, and who are they? Score 1: one authenticated user at a time. Score 5: broadcast to a public audience, indexed, archived. A 1:1 text conversation is low exposure. A social media auto-post is high exposure.

6. Domain sensitivity. What field is the agent operating in? Score 1: low-stakes general help. Score 5: regulated domains — health, legal, financial, election-related. Domain sensitivity is independent of consequence — a low-consequence wrong answer in a regulated domain still triggers compliance obligations that don't exist elsewhere.

7. Identity representation. Who does the agent appear to be? Score 1: explicitly labeled as AI, distinct from any individual. Score 5: speaks as a specific named human, creating reasonable belief that a human is present. The Air Canada chatbot was a 5 on this dimension. The customer believed a representative of Air Canada had quoted them a refund. The court agreed.

8. Data sensitivity. What kind of data does the agent touch? Score 1: public information only. Score 5: regulated PII, PHI, financial records, or credentials. An agent that knows your business hours is low. An agent that knows your customers' medication histories is high.

Why Independence Matters

The trick is that these dimensions don't track each other. A plumber's text-back bot might be autonomy 3 (acts on its own with daily review), action capability 2 (texts and books, nothing more), consequence severity 2 (a wrong booking is annoying but recoverable), reversibility 4 (easy to fix), exposure 1 (1:1 conversations), domain sensitivity 1, identity 3 (sounds like a person, not labeled as AI), data sensitivity 2.

A fully autonomous marketing campaign agent might be autonomy 5, action capability 4, consequence severity 4 (brand damage at scale), reversibility 1 (sent emails can't be unsent), exposure 5 (broadcast), domain sensitivity 2, identity 4, data sensitivity 3.

These are very different deployments. A scoring system that collapses them into a single number ("medium risk") loses the information that lets you decide what to do about it. The plumber bot needs guardrails on what it agrees to. The marketing agent needs human review on every send.

Compound Scoring

Independent dimensions still need to be combined into something actionable. We use a weighted geometric mean: each dimension contributes, but no single dimension can be hidden by averaging it against others. A deployment that's 5 on consequence severity and 1 on everything else doesn't get a "low risk" overall score — the geometric mean punishes the high dimension.

Then we layer hard-stop thresholds on top. Some combinations of dimensions trigger an automatic block, regardless of the average. Examples:

  • Identity representation 5 + audience exposure 4 = "speaking as a named human to a broadcast audience" — historically the configuration that produces the most damaging incidents. Not allowed without explicit owner acknowledgment.
  • Action capability 4 + reversibility 1 + autonomy 4 = "irreversible action, no human approval, high impact" — too dangerous to ship by default.
  • Domain sensitivity 5 + data sensitivity 5 = "regulated domain, regulated data" — requires compliance review, not just a configuration toggle.

The thresholds aren't arbitrary. They come from incident analysis: which configurations historically produce the failures that hit the news? Those are the ones we hard-stop.

What Happens at Each Tier

The compound score maps to five tiers, each with proportional safety measures:

  • Tier 1 (minimal): Standard logging, escalation on confusion. Default for low-stakes deployments.
  • Tier 2 (low): Above plus daily owner review of escalations.
  • Tier 3 (moderate): Above plus weekly conversation sampling, owner notification on policy edges, explicit guardrails on sensitive topics.
  • Tier 4 (high): Above plus human approval on configuration changes, real-time monitoring, scoped tool access, stricter trust boundary defaults.
  • Tier 5 (critical): Above plus pre-deployment review, ongoing audit, restricted to specific use cases.

The point is that "AI safety" stops being a binary. It's a sliding scale where the safety measures are proportional to the actual risk profile of this specific deployment, not the average of what AI vendors are selling.

Where the Engine Lives

The risk scoring engine isn't a separate dashboard or audit tool. It runs inside our MCP write handlers — the tools that let owners (and Claude Desktop, and other MCP clients) reconfigure their public-facing AI. Every time someone updates their instructions, toggles a tool, or changes their business info, the engine evaluates whether the change shifts the risk profile.

If the change moves a dimension up, the trust layer responds by surfacing a contextual interface that explains what just changed, walks the owner through what it means, and captures their understanding before the change takes effect. If the change crosses a hard-stop threshold, the system declines the change and explains why.

The owner never sees "your deployment is now Tier 3." They see: "You just enabled email tools. That means your agent can now send messages on your behalf to anyone in your contact list. Here's what that means for the kinds of mistakes that become harder to take back, and here's how to set the guardrails."

The risk framework is the engine. The owner experience is the conversation.

Why We Built It

Public-facing AI is the most dangerous deployment category for AI right now. Not because the models are bad, but because the deployments don't come with the safety scaffolding that internal AI tools and authenticated copilots have by default. There's no login, no account, no contract — just a stranger talking to an AI that represents your business.

The existing AI safety frameworks are good, but they're written for AI labs and large enterprises. They assume you have a compliance team, a deployment review board, and a budget for outside auditors. None of that exists for a plumber or a yoga studio. So the frameworks may as well not exist, for the audience that needs them most.

A scorable system, with criteria a non-expert can apply, that runs automatically at the moment of configuration, is the only way risk-proportional safety reaches the businesses actually deploying public-facing AI. Building it took less time than the marketing copy about it would have. The hard part was deciding it was the product, not a feature.

Try It

The interactive version at /risk-scoring lets you score the eight dimensions for example deployments — a plumber's text-back, a real estate lead qualifier, a healthcare appointment scheduler, a financial services intake bot — and see how the compound score and tier change as you move the sliders. You can also score your own deployment.

If you're building public-facing AI, the framework is yours to use. The numbers are derived from real frameworks, the criteria are documented, and the engine is open in the sense that it runs predictably from inputs you control. If something in the scoring looks wrong for your domain, tell us — incident data is how the thresholds get sharper.

Stop losing jobs to missed calls

AI texts your missed callers back in 30 seconds. Real conversations, not templates. 14-day free trial — no card required.

Related Articles