A doctor-native AI operating system will not arrive as a single product launch. It will accrete in layers — ambient documentation first, then inbox and recalls, then contextual clinical reasoning, then agentic workflow execution, then outcome intelligence — with each layer regulated, monitored, and audited harder than the one before it. This paper takes the topic map of an earlier strategic brief and reprojects it across four near-term horizons: 6 months (Nov 2026), 12 months (May 2027), 24 months (May 2028), and 36 months (May 2029). Each claim carries an explicit confidence or accuracy score so the reader can separate what is already happening from what is being inferred.
How to read the badges in this paper
ACC 95% An accuracy badge is attached to a referenced empirical claim. The score reflects how directly the cited source supports the statement (100% = source states this precisely; 70% = source supports the gist but with interpretation).
CONF 60% A confidence badge is attached to a speculative or projective claim (the kind that fills any 6–36 month forecast). The score is the authors’ subjective probability that the claim will be substantially correct by the stated horizon.
Colour: ≥ 75% high · 40–74% moderate · < 40% low / contested.
The earlier brief argued that the winning AI product for doctors over a five-year horizon will resemble a clinical command layer rather than an “AI doctor”: an ambient, governance-aware system that stitches together notes, inbox, investigations, recalls, referrals, billing, care-gap detection, guideline checks, patient messaging, risk surveillance, consent, audit trail, and outcomes feedback. CONF 90% — the strategic framing is consistent with how Epic, Oracle Health, Heidi, Lyrebird, Abridge, Suki, and Microsoft DAX are actually positioning their roadmaps.112
What the brief did not do is unpack the order of arrival. Five years is a long time in this market; doctors and procurement committees need to know which capabilities are real this quarter, which are believable within a year, which are plausible within two, and which are still genuinely speculative at three. That is what this paper attempts.
Two methodological notes:
The original brief’s strongest claim was that doctors do not need more “AI answers” — they need fewer open loops. CONF 88%
That claim ages well because the friction it describes is structural, not technological. A 7,260-physician Kaiser Permanente deployment of ambient AI scribes between October 2023 and December 2024 saved roughly 15,700 hours of documentation time, 84% of surveyed physicians said the AI improved their ability to connect with patients, and 82% reported greater job satisfaction.23 ACC 98% The clearest win was relief of cognitive and administrative load, not faster diagnosis.
The list of seven questions the brief said a doctor-native OS should answer — what matters about this patient now? what am I missing? what admin can be safely automated? what needs follow-up? what changed since I last saw them? what should be documented? which patients are silently deteriorating? where are my outcomes worse than expected? — is essentially the union of features that incumbents (Epic, Oracle Health) and startups (Heidi, Lyrebird, Abridge) are racing to ship. CONF 85% Epic publicly demonstrated three named assistants at its August 2025 UGM (Art for clinicians, Emmie for patients via MyChart, Penny for revenue cycle), and Oracle Health debuted a voice-first “Clinical Digital Assistant” with embedded agentic AI in late 2025.112 ACC 92%
The brief argued outcomes feedback is the real moat. That conclusion holds, but the bottleneck is concrete and shrinking on a definite timeline.
In Australia, the Australian Digital Health Agency reports that 75% of the planned actions in the National Healthcare Interoperability Plan 2023–2028 are now complete, with Services Australia commencing FHIR adoption discovery and implementation planned for 2025–26. The 2026 Federal Budget added a A$13.3 million two-year investment in Sparked, the national FHIR accelerator programme.4513 ACC 95%
This is the prerequisite for outcomes intelligence: longitudinal, identifier-linked, standards-based clinical exhaust that an AI layer can convert into disease-progression signals, missed-diagnosis signals, referral-loop closure, medication safety events, admission/readmission indicators, patient-reported outcomes, and doctor-specific feedback. CONF 65% Without it, “your outcomes are worse than expected” remains a dangerous claim because attribution is messy — adherence, social determinants, specialist care, family support, and cost all confound a single doctor’s contribution. The brief’s warning that the OS must avoid simplistic scorecards and instead use probabilistic signals, benchmarks, confidence intervals, and case-mix adjustment is the right discipline. CONF 90%
The brief proposed a hybrid: local clinical context + cloud-scale reasoning + strict auditability. By mid-2026 that is no longer a contrarian position — it is the dominant architectural pattern in both vendor reference designs and regulator guidance. CONF 85%
The Australian Signals Directorate’s ACSC guidance on AI data security explicitly flags privacy breaches, unauthorised access, third-party dependencies, insecure integrations, and misconfigured wrappers as the primary risk vectors for sensitive data in AI deployments, and recommends controls across all phases of the deployment lifecycle.6 ACC 96% The brief’s table mapping which layers belong where (patient-identifiable context: local; generic medical reasoning: cloud; audit logs: immutable cloud or private cloud; emergency fallback: offline) is consistent with that guidance and with the EU AI Act’s expectations for high-risk AI systems embedded in regulated medical devices.78 ACC 88%
The brief predicted that within five years, doctors and clinics will demand a visible AI control panel: model used, data accessed, confidence/uncertainty, source citations, hallucination-risk warnings, doctor approval state, medico-legal audit log, consent state, escalation rules, and clinical safety versioning. CONF 88%
That prediction has now been overtaken by regulation in a way the brief did not fully spell out. Three concrete deadlines are worth pinning down:
The implication: by mid-2027, any vendor selling AI into clinical workflows in Australia, the US, or the EU will need to operate as if a regulator could ask for the audit trail, the model card, the change-control plan, and the consent state — tomorrow. CONF 85% Boring really does win in healthcare.
The brief listed hallucination into the record as an existential risk. The 2025–2026 literature confirms it is real and quantifiable, but also tractable when scoped narrowly.
A peer-reviewed framework published in npj Digital Medicine in 2025 evaluated LLMs for medical text summarisation and found a baseline hallucination rate of 1.47% across summaries, with 44% of hallucinations classified as major (capable of impacting diagnosis or management if left uncorrected). Fabrications accounted for 43% of hallucinations, negations 30%, contextual errors 17%, and causality-related errors 10%; major hallucinations concentrated in the Plan (21%), Assessment (10.5%), and Symptoms (5.2%) sections of clinical notes.16 ACC 97% A separate multi-model assurance analysis (Nature Communications Medicine, 2025) found that leading LLMs repeated or elaborated on planted adversarial errors in 50–82% of cases, with a simple mitigation prompt lowering the rate from ~66% to ~44%.17 ACC 96%
The honest reading: ambient documentation under doctor review is now safe enough for production. Autonomous clinical reasoning without doctor review is not. CONF 85% The architectural answer the brief proposed — retrieve the minimum required facts, reason with bounded context, return explainable output, log everything, and keep the doctor in control — is precisely the discipline that makes the residual hallucination rate clinically manageable.
The brief’s ranking — primary care first, then radiology/pathology, then dermatology, oncology, psychiatry, and emergency — is consistent with where revenue, evidence, and adoption are accumulating in 2026. CONF 80%
Caveat: the brief slightly under-rates revenue-cycle and prior-authorisation use cases. By early 2026, ~70% of US health plans were prioritising agentic AI in utilisation management and prior authorisation, with Cohere Health processing over 12 million authorisation requests annually and end-to-end prior-auth sequences completing in under 10 minutes.2223 ACC 90% In Australia the equivalent wedge is PBS authority forms, eligibility checks, and Medicare item-number selection — precisely the workflow that Medflow already addresses. CONF 80%
The brief presented a year-by-year five-year roadmap. Compressing it into the four near-term horizons changes the picture in instructive ways: most of what the brief assigned to “Year 1” is already shipping in May 2026 and will be table stakes within six months. The harder transitions sit at the 12- and 24-month marks.
Ambient AI scribes have crossed the chasm into mainstream primary care. Kaiser Permanente has logged over 2.5 million ambient scribe encounters and reported 15,700–16,000 hours saved across 7,260 physicians; the AMA reports comparable findings in Permanente Medical Group deployments.2324 ACC 96% In Australia, Heidi claims 50% GP usage and Lyrebird is building an evidence layer (MedLuma) on top of Medcast's knowledge base.19 ACC 90%
The market moves from “write my note” to “understand my patient.” The deciding factor is whether vendors can retrieve and reason over the patient’s longitudinal record without breaching the architectural and regulatory constraints set out above. CONF 80%
The OS starts doing multi-step work end-to-end, not just suggesting it. The brief’s “Year 3” examples — book repeat HbA1c in 3 months; draft endocrinology referral; message patient about statin discussion; add recall; check eligibility for care plan; prepare next visit agenda — become standard agentic capabilities by mid-2028 in primary care and high-volume specialty practice. CONF 70%
By 2029 the brief’s “Year 4 / Year 5” vision starts to be realised in practices that committed early to the architecture: clinical context, ambient documentation, decision support, workflow automation, billing, secure communication, local memory, outcome analytics, model governance, and consent/audit working as a single command centre. CONF 60% Even the leaders will not have all of it; the question is whether the platform stitches them together.
The brief argued that doctor-specific personalisation — preferred note style, risk tolerance, specialty interest, prescribing habits, referral networks, common patient population, billing patterns, follow-up discipline, inbox behaviour, “things this doctor often forgets,” “things this doctor cares about” — is underrated. That remains true. CONF 85%
The Kaiser Permanente deployment showed that adoption was highest in documentation-heavy specialties and that satisfaction was strongly mediated by how well the AI fit individual workflow.2 ACC 90% Generic guideline recitation does not change behaviour; “you usually manage similar patients this way; here’s what differs today” does. The product frontier through 2027–2028 is to build that personal layer without it becoming an invisible behavioural manipulator. CONF 70%
| Horizon | Dominant risk | Why it matters |
|---|---|---|
| 6 months | Commoditisation of the ambient scribe | Margins compress fast once Heidi, Lyrebird, and Microsoft DAX all do “good enough” transcription in Australian English. CONF 80% |
| 12 months | Hallucination into the record | Move from documentation to contextual reasoning increases the surface area for major hallucinations (Plan/Assessment sections most affected).1617 CONF 85% |
| 24 months | Regulatory non-compliance | EU AI Act, FDA PCCP, and TGA enforcement converge. Vendors without lifecycle risk management, audit trail, and PCCP-style change control are exposed.7914 CONF 85% |
| 36 months | Outcome-data overclaim | Doctor / practice benchmarking shipped without case-mix discipline will trigger backlash, regulator interest, and indemnity issues. CONF 80% |
The brief framed the opportunity as the AI operating layer for private medical practice. The four-horizon view sharpens the sequencing.
Private practice has the structural conditions the brief identified — fragmented systems, admin pain, billing complexity, poor outcome visibility, high doctor autonomy, willingness to pay if workflow pain drops, less enterprise procurement hell than hospitals — and those conditions are not going away inside the three-year horizon. CONF 85%
The original brief’s strongest line — the winner is not the best model; the winner is the product that owns the clinical feedback loop — remains the correct strategic frame. CONF 88% What the time-compressed view adds is a sense of when each piece of that loop becomes commercially viable, and what regulatory or interoperability dependency it sits on.
The next 6 months are about admin pain (already largely solved technically, still being adopted). The next 12 months are about contextual reasoning (technically real, regulatory burden rising). The next 24 months are about agentic orchestration (technically plausible, the verifier is the load-bearing component). The next 36 months are about outcome intelligence (architecturally hard, politically harder, and the real moat for whoever builds it humbly).
The final restatement from the brief still holds:
A secure clinical cockpit that knows the patient, knows the doctor, knows the workflow, tracks outcomes, automates safe admin, highlights risk, and leaves the final clinical judgment visibly with the doctor.
The thesis is not “AI will replace doctors.” It is: doctors using outcome-aware AI operating systems will outperform doctors using passive record systems. CONF 88%