Risks We're Facing - Hermes Intelligence

Vextrum's product promise is that every conclusion can defend itself, and that the system will tell you what it can't see. A company that ships that promise has no business hiding its own risks. So this page exists for two readers: ourselves, to keep us honest about the gap between the vision and what is actually built; and an investor, who should be able to ask us about any risk on this list and find we have already named it, sized it, and acted on it.

None of what follows is hypothetical hand-wringing. It is grounded in the real state of the product: a V0 operating backbone that our CTO is actively rebuilding right now, a V1 "living universe" that is largely still ahead of us, a market full of names everyone knows, and a category — decision-grade intelligence — where being confidently wrong once, at the wrong moment, can end a client relationship. We would rather write that down than discover an investor already knows it and we don't.

Maintained by the CPO · reviewed against the V1 roadmap · a living document, not a pitch slide

Critical — existential if it lands High — would set us back hard Medium — costly, survivable Lower — manageable Each risk: exposure · severity × likelihood · how we deal with it · what we watch

The Map

Thirteen domains of risk, ordered roughly by how much they keep us up at night. The dots are a rough severity read, not a promise.

R-01

The vision-to-reality gap

The living universe is mostly still ahead of us, and the backend underneath it is mid-rebuild.

R-02

Evidence & trust integrity

The proof chain is the product. One confident wrong call at high stakes breaks it.

R-03

Security & confidentiality

We hold our clients' strategic intent. A breach is not a setback; it is an ending.

R-04

Market & go-to-market

A painkiller that gets perceived — and priced — as a vitamin. Long, expensive sales.

R-05

Competitive & moat durability

Incumbents and foundation models moving up-stack into the space we're defining.

R-06

Model & LLM dependency

"Never guesses" sits on top of machines that, by default, guess.

R-07

The one-product bet

One architecture for investor, macro, credit, corporate and government. Reach vs depth.

R-08

Data sourcing, rights & coverage

Access, licensing, legality — and blind spots in the blind-spot detector.

R-09

Regulatory & legal

AI regulation, OSINT on people, the investment-advice and market-abuse lines.

R-10

Unit economics & runway

LLM and pipeline cost per client versus what an institution will pay.

R-11

Key-person & organisation

A small team carrying a large vision, with intelligence built at a two-person seam.

R-12

Adoption & human factors

Analyst trust, automation bias, and the friction of telling people things they'd rather not hear.

R-13

Ethics, reputation & dual-use

Intelligence is powerful. The optics — and the misuses — are real.

The Five We Lose Sleep Over

If you only press us on a handful, press on these. They are the ones where the honest answer is "we have a plan, and we know the plan isn't finished."

01 — EXECUTION

The gap between the deck and the build

The living, self-adjusting universe is the story. Today it is an operating backbone being rebuilt, with the "alive" parts scheduled, not shipped. The risk is the demo outrunning the system.

02 — TRUST

One confident, wrong, high-stakes call

We are sold on defensibility. A single proof card that confidently defends a false conclusion in front of an investment committee can cost the account and the reference.

03 — SECURITY

A breach of strategic intent

Our clients tell us what they're trying to decide — their theses, targets, and blind spots. That is the most sensitive data they own. We cannot afford to be the leak.

04 — MARKET

Priced and churned like a dashboard

We are, by our own admission, a painkiller in a category full of vitamins. If the pain isn't made felt at the decision, we get bought cheap and cut first.

05 — MOAT

The big players arrive

"Decision intelligence with proof" is a category we're naming. Bloomberg, Palantir, and the model labs can all walk toward it. Our head start has to compound into a moat.

R-01 · The Vision-to-Reality Gap

This is our largest risk, and the most honest thing on the page. The vision — a living decision universe that adjusts intelligently as the client's world changes — is real and well-specified. But a large part of it is still ahead of us, and the operating layer beneath it is being rebuilt as we speak. The danger is not that the vision is wrong; it's that the distance between what we say and what runs is wide, and closing it is a hard, multi-month engineering effort with a small team.

01.1

The "living universe" is mostly specified, not yet built

Severity: HighLikelihood: High

The features that make Vextrum more than a nicer dashboard — living ontology, dynamic operating spec, self-evolving knowledge, the belief curve — are V1 work, and V1 has barely started.

Our exposure

V0 gives us the backbone: client config, ontology, operating spec, evidence lineage, and the front-end intelligence components. But our own guardrails say it plainly — "we do not need a living ontology in V0." The adjustment-not-regeneration behaviour, the ontology that proposes its own additions, the Theme/Hypothesis belief curve — these are scheduled for V1 phases starting after the V0 reservations land. An investor who sees a polished demo could reasonably assume the living parts are running. Mostly, they are not yet.

How we deal with it

We sequence deliberately: V0 is "shaped like the final Vextrum" even before it is the final Vextrum — stable contracts, lineage, and proof first, so V1 is an extension rather than a rebuild. We reserve schema shapes in V0 (empty tables, zero migration) so the living layer bolts on without re-keying. And we are disciplined about demoing what runs, framing the rest as roadmap, not as shipped.

What we watch

Roadmap slippage against the V1 phase dates; the count of intelligence outputs that carry a complete proof chain end-to-end (not a reconstructed one); and any demo script that quietly depends on a capability the pipeline can't reproduce on live data.

01.2

The backend is being rebuilt underneath the product

Severity: HighLikelihood: High

Our CTO is generalising the time-series layer into a unified "series spine" right now, and the new Theme/Hypothesis intelligence sits exactly on the seam between his work and the CPO's.

Our exposure

The series layer is moving from "carry-forward under an entity" to a deliberate generalisation (an anchor that can be an entity, a theme, or a thesis-confidence belief curve). That is the right design, but it is a live rebuild, and the most product-defining new step — Theme/Hypothesis coverage, the thing that turns a number into a moved belief — is explicitly "half the intelligence on the seam," with ownership still being settled on a call between two people. Seams are where projects bleed time.

How we deal with it

We wrote the data contracts down before the rebuild, one-to-one with the CTO's tables, so both sides build against the same shapes. We froze V0 ("V0 stays untouched; in V0 we only reserve shapes") so the rebuild can't ripple into a migration. And we are forcing the ownership question — "who takes which half" — to a decision rather than letting it sit ambiguous at the seam.

What we watch

Whether the V0 schema reservations actually land by the cutoff without blocking the rest of V0; drift between the two repository copies of shared logic (we have already seen this happen and had to reconcile it by hand); and how cleanly the first thesis-confidence series is produced by the synthesis step rather than bolted on.

01.3

Demo-grade reliability vs. production-grade reliability

Severity: MediumLikelihood: High

A 12-hour pipeline of triage, extraction and synthesis across many sources has many places to silently fail — and a system whose pitch is "we tell you what we can't see" is judged harshly when it just breaks.

Our exposure

Long-running compute, cold starts on serverless workers, multi-step async jobs, and state that has to survive a user closing the tab for twenty minutes — every one of these is a place a session can wedge. We have already found and fixed exactly this class of bug (an onboarding session that reached a stage without the data that stage needed, and spun forever). There are more like it. Each one, hit by a real client, reads as "the intelligence system can't keep itself running."

How we deal with it

Defence-in-depth on the operational layer: heartbeats and idle windows that keep workers warm while a client is active, idempotent transitions, stale-op detection, and — the lesson from the bug above — never letting the UI spin forever; it self-heals or surfaces a retry. V0's explicit goal is "the first proper operating layer" with visibility into what happened in the pipeline and why, not a loose chain of jobs.

What we watch

Stuck-session and failed-run rates per pipeline cycle; cold-start latency on the first action after a return; and the share of runs that complete with a full, queryable lineage rather than a partial one.

R-02 · Evidence & Trust Integrity

Everything we are differentiates on one promise: every conclusion can defend itself, and the system shows its own weak points before you do. That promise is also our single biggest liability. The proof chain, the Red Team, and the Blind Spots control plane are not features we can let degrade — they are the moat. If the defence is ever weaker than the confidence, we are worse than a dashboard, because we taught the client to trust us.

02.1

A confidently-defended wrong conclusion

Severity: CriticalLikelihood: Medium

The worst outcome isn't a hallucinated fact — it's a well-formed proof card that defends a false thesis, presented by our client to the people who hold them accountable.

Our exposure

We arm the client to walk into an investment committee or a board with our Proof Card as the artifact. That is the value — and the danger. If the evidence chain is plausible but the synthesis on top of it is wrong, we don't just give a bad answer; we give a bad answer wearing the credibility of a citation trail. In high-stakes seats, that is the failure that ends relationships, and it travels by word of mouth.

How we deal with it

The architecture is built to resist exactly this. Confidence is computed only from linked decision requirements and the metrics that test a thesis — never from free text — so the system cannot invent importance. Every output ships a Red Team panel: Known / Assumed / Missing / what-would-change-my-view, surfaced inline at the moment of action. Proof tiers and a source-trust hierarchy gate what is allowed to read as high-confidence. The product is designed to argue against itself before the client does.

What we watch

Calibration: when we say 57%, are we right ~57% of the time? Single-source claims that reach high confidence. The rate at which the Red Team actually surfaces a material weakness versus rubber-stamping. And every client-reported "this was wrong" as a sev-1 review, not a ticket.

02.2

The blind-spot detector has its own blind spots

Severity: HighLikelihood: Medium

We make a strong claim — "silence is only meaningful if coverage is known." If our coverage map is itself wrong, we convert an unknown gap into false confidence, which is worse than saying nothing.

Our exposure

Blind Spots are a first-class control plane: we tell the client what we can't see. The moment we render "covered on X" for an area we are not actually covering well, we've manufactured exactly the expensive-miss the feature exists to prevent — and we did it with authority. The detector is only as honest as the source universe and the ontology behind it.

How we deal with it

Coverage is derived from the same evidence lineage as everything else, not asserted. Source discovery is adversarial and revisited, not one-shot. The ontology can propose its own additions when extraction repeatedly hits concepts it doesn't understand — closing the gap between "what the client's world contains" and "what our map contains" over time, with the user approving each patch.

What we watch

Post-hoc misses inside areas we marked covered (the cardinal sin); the volume of system-proposed ontology additions that get approved (a healthy non-zero rate means the map is learning); and coverage breadth per strategic question over time.

R-03 · Security & Confidentiality

A market-data terminal holds data the whole market can see. We hold something far more sensitive: what each client is trying to decide. Their theses, their targets, the competitor they're worried about, the tender they're chasing, the gaps they know they have. For an institutional client, that is among the most confidential information they own. The trust bar is therefore not "good SaaS security" — it is "we are a custodian of strategic intent," and a single breach is not a setback we recover from.

03.1

A breach of client strategic intent

Severity: CriticalLikelihood: Low–Med

The onboarding conversation alone reveals a client's whole hand. Aggregated across clients, our database is a map of what sophisticated institutions are about to do.

Our exposure

We store strategic questions, theses with confidence levels, monitored entities and the evidence behind them — per client, in one system. A leak wouldn't just embarrass; it could be market-moving or expose a client's competitive position. And because the value of the data is so high, we are a more attractive target than our size would suggest.

How we deal with it

Strict tenant isolation as a first principle — today the workspace is the hard scope boundary, and even as we add cross-workspace entity identity for portfolios, the data boundary stays tenant-scoped. Least-privilege access to client universes, encrypted storage, scoped credentials, and a deliberately small surface of who can touch raw client data. Security posture is treated as a product requirement, not an afterthought, because for this clientele it is the product.

What we watch

Access logs to client universes (who/why/when), any cross-tenant data path in code review, third-party/sub-processor exposure, and dependency/credential hygiene. We treat a near-miss here the way we treat a wrong conclusion: as a sev-1, not a note.

R-04 · Market & Go-to-Market

We have already written the sharpest version of this risk into our own positioning: Vextrum is a painkiller that will be perceived as a vitamin unless the pain is made felt. That self-awareness is the mitigation's starting point — but naming a trap is not the same as escaping it. Selling decision-grade intelligence to institutions is a long, high-trust, high-CAC motion, and our category doesn't have a budget line yet.

04.1

Painkiller perceived — and priced — as a vitamin

Severity: HighLikelihood: High

Defensibility and blind-spot closure are only acutely valuable at two moments — when a position is challenged, and right after a near-miss. If we surface that value only when the user goes looking, we read as "a nicer dashboard."

Our exposure

"Market intelligence" and "monitoring" are a graveyard of vitamins — bought in good years, cut first in bad ones, because no one can draw a straight line from them to a dollar saved or a decision defended. If we let ourselves be filed in that drawer, we get priced low, compared on feeds-and-features, and churned the first time a budget tightens.

How we deal with it

We engineer the pain to land before the customer rationalises us as a vitamin: surface the Red Team inline the instant someone acts on a card; quantify the catch ("this reached you 9 days before the press"); keep a persistent coverage scoreboard so absence is felt, not assumed; make the Proof Card the artifact they bring to the committee; and fire proactively when something that would change a held view appears. We sell the two acute pains — un-defendable conviction and unknown blindness — not the feature list.

What we watch

Whether deals are won on defensibility/coverage or on features; price realised vs. dashboard comps; net revenue retention and churn reason codes; and whether usage clusters around the decision moments (proof cards opened at action time) or just passive reading.

04.2

Long sales cycles, high CAC, credibility tax

Severity: HighLikelihood: Medium

Institutions don't stake decisions on an unknown vendor quickly. The same "high stakes" that make our value real also make our sales slow.

Our exposure

Procurement, security review, pilots, and the simple fact that a new name in a high-trust seat carries a credibility tax. Long cycles burn runway and make early revenue lumpy and concentration-prone.

How we deal with it

Lead with a sharp wedge per vertical — for government, a tender-driven entry where deadlines and eligibility are first-class and the value is unambiguous and time-boxed. Use the Proof Card and a live blind-spot scoreboard as the pilot's "show, don't tell." Land on one acute decision, expand into the universe. Borrow credibility through design and rigour until we have references.

What we watch

Time-to-first-value in a pilot, pilot-to-paid conversion, sales-cycle length by vertical, and customer concentration (revenue share of the top accounts).

R-05 · Competitive & Moat Durability

We are deliberately naming a category — decision intelligence as a living system, with proof. Naming a category is powerful and dangerous: it means no one is there yet, and it means the biggest players can all walk toward it. Bloomberg has the data and distribution, Palantir has the integration layer and enterprise trust, AlphaSense and Recorded Future have search and feeds, and the model labs have the raw capability. Our head start only matters if it compounds into something they can't easily copy.

05.1

Incumbents and model labs move up-stack

Severity: HighLikelihood: Medium

"AI summary over our data" is now a checkbox feature everyone is shipping. If decision-grade intelligence collapses into a commodity LLM layer, our differentiation thins.

Our exposure

The current wave of "ask our terminal a question" features gets us compared to giants on a single capability. A well-funded incumbent could bolt a proof-lite narrative onto an existing data moat and claim the same territory with far more distribution.

How we deal with it

Our defensibility isn't any single capability — extraction, monitoring, dashboards all exist. It's the closed loop with evidence lineage, mapped to a specific client's decision universe: client intent → operating spec → discovery → triage → extraction → synthesis → proof → challenge → action → the system adapts. The compounding assets are the per-client living ontology and the accumulated evidence/decision memory — an analyst team that remembers how this client thinks. That is expensive to replicate per account and gets stickier with time, which is the opposite of a commodity.

What we watch

Incumbent feature announcements that approach proof/coverage; how quickly a new client universe reaches "remembers how they think" depth; expansion and retention as the proxy for switching cost; and whether prospects describe us as "a category" or "an AI feature."

R-06 · Model & LLM Dependency

Our hook line is "an intelligence analyst that never sleeps and never guesses." It runs on top of large language models, which — left to their own devices — guess fluently and confidently. Bridging that gap is an architectural choice, and a standing risk: in capability, in cost, in provider dependence, and in the silent drift of a model we don't control.

06.1

Hallucination under an evidence-first promise

Severity: HighLikelihood: Medium

A single fabricated entity, quote, or relationship that slips into the evidence chain undermines the one thing we promise can't happen.

Our exposure

LLMs touch extraction, synthesis, and the language of every output. Any of those is a place an unsupported claim can enter wearing the costume of a sourced fact.

How we deal with it

We constrain the model rather than trust it: evidence is an interface — every decision requirement must be answered by a real document or a real metric, with a provenance reference, or it doesn't count. Confidence derives only from those linked requirements and metrics, never from generative text. Extraction is structured and traceable to a source quote/excerpt. The system's job is to ground and cite, not to opine.

What we watch

Unsupported-claim rate in audits of synthesised outputs; the fraction of conclusions whose every link resolves to a real source; and citation-resolution failures (a reference that doesn't point at anything).

06.2

Provider dependence, cost, and silent drift

Severity: MediumLikelihood: Medium

We depend on a small number of model providers for capability, price, and availability — and a model can change behaviour underneath us without notice.

Our exposure

Pricing moves, rate limits, deprecations, or a quiet quality regression in a hosted model can hit accuracy and margin at once. Concentration on one provider is a single point of failure for the whole product.

How we deal with it

A gateway abstraction over providers with fallback/round-robin so no single model is load-bearing; the heavy reasoning is separable from any one vendor; and because confidence is computed from evidence rather than vibes, a model swap changes wording far more than it changes conclusions. Cost is engineered (tiered model use, caching, doing cheap classification before expensive synthesis).

What we watch

Cost per pipeline run and per client; provider concentration; and a regression suite that flags accuracy/behaviour drift when a model version changes.

R-07 · The One-Product Bet

Our growth thesis is elegant: add one primitive — a way to hold an idea that isn't pinned to a single company (a macro regime, a thematic trend, a credit thesis, a brand narrative) — and the same product suddenly serves macro, credit, commodities, quant, crypto, private markets and corporate, without building a second product. Elegant bets carry a matching risk: that generality buys reach at the cost of depth, and we end up shallow everywhere instead of indispensable somewhere.

07.1

One architecture for every vertical — reach vs. depth

Severity: HighLikelihood: Medium

The Theme/Hypothesis primitive lets one backbone address the whole market. If the abstraction leaks, each vertical gets a product that's generic where it needed to be expert.

Our exposure

A macro fund thinks in regimes, a thematic fund in trends across many names, credit in capital structure, a corporate team in competitive narrative, government in tenders and eligibility. Serving all of them from one architecture is a real test: a too-thin abstraction produces intelligence that feels generic to a specialist, who will choose a deep point-solution over a broad one.

How we deal with it

The generality is principled, not cosmetic: verticals share one backbone and differ in vocabulary, priorities, source expectations, alert rules, active pillars, and operating-spec interpretation — not in three separate products. The Operating Spec is the per-client control layer that turns the same engine into a tender-alerting system, a thesis monitor, or a market-entry watch. We prove depth on a lead vertical first and let the shared backbone earn the next one, rather than claiming all of them on day one.

What we watch

Per-vertical depth signals (do specialists say "this gets my world" or "this is generic"?); how much per-vertical work is config/operating-spec vs. net-new engineering; and whether the Theme primitive actually carries macro/credit cleanly or needs constant special-casing.

R-08 · Data Sourcing, Rights & Coverage

Intelligence is only as good as what flows into it. Our entire value sits downstream of source access — and source access is a moving target of licensing terms, terms-of-service, copyright, and plain availability. Coverage gaps don't just reduce quality; in our product they corrupt the blind-spot map, which is the feature clients trust most.

08.1

Source access, licensing & legality

Severity: HighLikelihood: Med–High

Much of the world's relevant signal lives behind terms-of-service, paywalls, and copyright. The rules for ingesting and processing it are tightening, not loosening.

Our exposure

Scraping restrictions, API terms, content licensing, and the evolving legal posture toward machine ingestion of third-party content all bear directly on what we can source. A key feed changing its terms, or a rights challenge, can degrade coverage or create legal exposure.

How we deal with it

Source trust and proof tiers are governed per client by the Operating Spec, not hardcoded — so the source-of-record can be tuned by domain (crypto, private markets, and OSINT can legitimately relax a rigid Tier-1 floor) while staying explicit and auditable. We diversify feeds so no single source is load-bearing, prefer licensed and primary sources where stakes are high, and keep provenance on every signal so a rights or quality problem is traceable and removable.

What we watch

Dependence concentration on any single source; ToS/licensing changes on key feeds; coverage by domain and tier; and the proportion of high-confidence outputs resting on primary vs. secondary sources.

R-09 · Regulatory & Legal

We operate at the intersection of three regulated worlds — AI, finance, and information about real entities and people — across multiple jurisdictions. Most of these risks are manageable with discipline and the right framing, but they are real, and a few of them sit close to bright lines we must not cross.

09.1

AI regulation, profiling & data protection

Severity: HighLikelihood: Medium

An AI system that profiles entities and people and informs consequential decisions attracts both AI-specific regulation and data-protection law.

Our exposure

Tracking named people and organisations as first-class entities raises GDPR and OSINT-on-individuals questions; emerging AI regulation may classify decision-influencing or profiling systems as higher-risk, with transparency and governance obligations. Government and asset-tracing work raises the bar further.

How we deal with it

Our architecture is, fortunately, built for explainability: full evidence lineage, a stated Known/Assumed/Missing posture, and human-in-the-loop decisions (the system proposes, the user decides) are precisely what AI governance regimes ask for. We keep humans on consequential calls, source from legitimate channels, minimise and scope personal data to what the client's stated purpose requires, and treat compliance as a design input — not a retrofit.

What we watch

Regulatory developments in the jurisdictions we sell into; the share of processing involving personal data; and whether every consequential output retains a defensible lineage and a human decision point.

09.2

The financial bright lines & liability for decisions

Severity: HighLikelihood: Low–Med

Serving investors puts us near investment-advice, market-abuse, and material-non-public-information lines — and clients act on our output, which raises the question of our liability when they're wrong.

Our exposure

We must not become an unlicensed investment adviser, a conduit for MNPI, or a defamation risk when we characterise a named company or person. And because clients make real decisions on our intelligence, a bad call invites "you told us so" liability.

How we deal with it

We are an evidence and intelligence layer, explicitly not an adviser: we surface sourced signals, expose assumptions, and leave the decision — and the accountability — with the client, by design. We source from public/licensed channels and keep provenance to stay clear of MNPI and to defend characterisations. Contracts set the scope of reliance. The product's honesty (Known/Assumed/Missing) is also our best legal posture: we never claim certainty we don't have.

What we watch

Any output that reads as a recommendation rather than sourced intelligence; provenance gaps on entity characterisations; and counsel review of positioning, contracts, and disclaimers as we enter each regulated vertical.

R-10 · Unit Economics & Runway

A living intelligence universe is compute-hungry: a recurring pipeline of discovery, triage, extraction and synthesis, much of it on large models, per client, on a cadence. The product can be magical and still lose money per account if the cost to keep a universe alive outruns what the client will pay. Early on, a few large clients also means concentration.

10.1

Cost-to-serve vs. willingness-to-pay

Severity: Med–HighLikelihood: Medium

Keeping a client's universe alive — re-running discovery, re-scoring, re-synthesising on a cadence — has a real recurring compute cost that scales with breadth and frequency.

Our exposure

LLM calls across the pipeline plus long-running workers are the dominant cost, and they grow with the size of the source universe and the cadence of refresh. Priced wrong, scale makes the loss bigger, not smaller.

How we deal with it

We design the pipeline to spend compute where it earns its keep: cheap classification before expensive synthesis, adjustment-not-regeneration (we don't rebuild a universe when a client makes a small change — we surgically adjust the affected branches), tiered model use, caching, and reservation-based scaling. The Operating Spec governs what is worth monitoring closely vs. what stays background, so we don't pay premium attention to low-value signal. And the target buyer is an institution for whom one prevented expensive miss dwarfs the subscription.

What we watch

Gross margin per client; compute cost per pipeline run and its trend as universes grow; the ratio of incremental cost to incremental client change (the adjustment-not-regeneration payoff); and revenue concentration across top accounts.

R-11 · Key-Person & Organisation

A large, opinionated vision is currently carried by a small team, and the most product-defining intelligence is being built at the seam between two people. That is normal at this stage and it is also a genuine risk: concentration of knowledge, coordination overhead at the seam, and the simple fragility of a small group building something ambitious on a tight timeline.

11.1

Concentration of knowledge & the CPO/CTO seam

Severity: HighLikelihood: Medium

The vision, the product architecture, and the backend rebuild live in a small number of heads, and the hardest new piece is owned jointly across the CPO/CTO boundary.

Our exposure

Onboarding, config, ontology, operating spec and front end sit with the CPO; ingestion, extraction and the series layer with the CTO; and the Theme/Hypothesis coverage step — "half the intelligence" — straddles both. If either person is unavailable, or the seam stays ambiguous, momentum and quality are at risk. Knowledge that lives only in conversation doesn't survive a bus.

How we deal with it

We write the spine down: contracts, guardrails, roadmaps and design docs are explicit and versioned, so intent outlives any single conversation. We force seam decisions ("who takes which half") to be made, not assumed. We reserve schema shapes early so work can proceed in parallel without blocking. And the medium-term answer is deliberate hiring against exactly these concentration points.

What we watch

Bus-factor on each critical subsystem; how much of "how it works" exists only in someone's head vs. in a doc; and time lost to seam ambiguity between the two halves of the intelligence layer.

R-12 · Adoption & Human Factors

Even a perfect product can fail at the last inch — the human deciding whether to trust it. Vextrum augments expert analysts, and experts are rightly sceptical of a machine that tells them their world. Our adversarial honesty helps build that trust, but it also creates its own friction: a system whose job is to tell people uncomfortable, view-changing things is, by design, sometimes unwelcome.

12.1

Analyst trust, automation bias, and unwelcome truths

Severity: MediumLikelihood: Med–High

Two failure modes pull in opposite directions: experts trusting us too little to change behaviour, or trusting us too much and switching off their own judgement.

Our exposure

Under-trust: a sceptical analyst treats us as background noise and the product never reaches the decision. Over-trust: a client outsources judgement to the system and is blindsided when it's wrong. And the honest version of the product proactively challenges held views — valuable, but psychologically harder to adopt than a tool that flatters.

How we deal with it

We position as the analyst's instrument, not their replacement — the thing they present with at the moment of accountability. The Proof Card builds earned trust (every claim is inspectable), and the Red Team explicitly guards against over-trust by always showing what would change the conclusion. We make the system challenge beliefs with evidence, which is far easier to accept than an unsupported contradiction.

What we watch

Whether proof cards get opened at decision time (earned trust) or ignored; usage that indicates judgement is being outsourced wholesale; and qualitative signal on whether analysts feel armed or second-guessed.

R-13 · Ethics, Reputation & Dual-Use

"Intelligence" is a powerful, loaded word. A system that tracks entities, traces structures, and informs consequential decisions for funds, corporates and governments carries real ethical weight and real optics. We would rather hold ourselves to a standard here than have one imposed after a misstep.

13.1

Surveillance optics & dual-use

Severity: Med–HighLikelihood: Low–Med

The same capability that helps a fund avoid an expensive miss can read as surveillance, and intelligence tooling is inherently dual-use.

Our exposure

Government and asset-tracing work especially invites a "surveillance company" framing; a high-profile wrong call about a named person or entity is both a legal and a reputational event; and powerful tooling can be put to uses we'd refuse to endorse.

How we deal with it

We anchor on legitimate sources, evidence, and transparency rather than covert collection — our brand is defensibility, which is the opposite of shadowy. We choose clients and use-cases deliberately, keep humans accountable for consequential calls, and hold the same Known/Assumed/Missing honesty about people and entities that we hold about theses. The discipline that makes us trustworthy to clients is the same discipline that keeps us on the right side of the optics.

What we watch

Use-cases and clients against an internal line we won't cross; sensitivity of entity-level claims about individuals; and any public characterisation that can't be fully defended by lineage.

How We Hold the Line

Risks are managed by principles that don't bend under deadline pressure, not by good intentions. These are the rules we've written down to keep V0 and V1 shaped like the real Vextrum — they are the operating expression of this entire page.

Shaped like the final Vextrum

V0 doesn't have to be the finished product, but it must be shaped like it — stable contracts, lineage, proof and audit first — so V1 is an extension, never a rebuild. Cutting the wrong corner now means a backend that "works for demos but fights us" later.

Evidence is an interface

Every decision requirement is answerable by a document or a metric, with provenance — or it doesn't count. Confidence is computed only from linked requirements and metrics, never from free text. The system can't invent importance.

Proof tiers from the Operating Spec

What counts as a source-of-record is decided per client and domain by the Operating Spec, never hardcoded. Raw source authority is an input, not a verdict — so crypto, PE and OSINT can be handled honestly without a rigid floor.

Reserve, don't migrate

New shapes are reserved in V0 as empty schema (zero migration) and built in V1. The living layer bolts on without re-keying the universe — so the rebuild can't ripple into a destabilising migration mid-flight.

Adjust, don't regenerate

When a client changes one thing, we surgically adjust the affected branches and preserve object identity — not panic and rebuild the universe. Continuity of intelligence is a feature and a cost control at once.

The system proposes, the user decides

Ontology additions, config changes, consequential actions — the system does the discovery work and surfaces a recommendation; the human makes the call. This is our trust model and, conveniently, our regulatory posture.

It defends itself, or it doesn't ship

Every visible intelligence object carries a proof chain and a Red Team. The screwdriver behind the screen. Without it, Vextrum looks intelligent and isn't trusted when the stakes are high — which is the only time it matters.

This list is alive

This page is reviewed against the roadmap, not written once for a raise. A risk we've stopped watching is a risk we've started taking. The point of adversarial honesty is that it never gets to clock off.