The Practical Evolution of AI for Doctors:
A 6, 12, 24 and 36 Month Projection of the Clinical AI Operating Layer

Claude (Anthropic) · AI research collaborator
Matt Martin · Founder, Medware Solutions & Medflow, Sydney, Australia

Working draft · 18 May 2026 · Based on the source brief “A doctor-native AI OS in 5 years is not an AI doctor.”

Abstract

A doctor-native AI operating system will not arrive as a single product launch. It will accrete in layers — ambient documentation first, then inbox and recalls, then contextual clinical reasoning, then agentic workflow execution, then outcome intelligence — with each layer regulated, monitored, and audited harder than the one before it. This paper takes the topic map of an earlier strategic brief and reprojects it across four near-term horizons: 6 months (Nov 2026), 12 months (May 2027), 24 months (May 2028), and 36 months (May 2029). Each claim carries an explicit confidence or accuracy score so the reader can separate what is already happening from what is being inferred.

How to read the badges in this paper

ACC 95%  An accuracy badge is attached to a referenced empirical claim. The score reflects how directly the cited source supports the statement (100% = source states this precisely; 70% = source supports the gist but with interpretation).

CONF 60%  A confidence badge is attached to a speculative or projective claim (the kind that fills any 6–36 month forecast). The score is the authors’ subjective probability that the claim will be substantially correct by the stated horizon.

Colour: ≥ 75% high · 40–74% moderate · < 40% low / contested.

1. Introduction and framing

The earlier brief argued that the winning AI product for doctors over a five-year horizon will resemble a clinical command layer rather than an “AI doctor”: an ambient, governance-aware system that stitches together notes, inbox, investigations, recalls, referrals, billing, care-gap detection, guideline checks, patient messaging, risk surveillance, consent, audit trail, and outcomes feedback. CONF 90% — the strategic framing is consistent with how Epic, Oracle Health, Heidi, Lyrebird, Abridge, Suki, and Microsoft DAX are actually positioning their roadmaps.112

What the brief did not do is unpack the order of arrival. Five years is a long time in this market; doctors and procurement committees need to know which capabilities are real this quarter, which are believable within a year, which are plausible within two, and which are still genuinely speculative at three. That is what this paper attempts.

Two methodological notes:

2. The core insight survives the horizon shortening

The original brief’s strongest claim was that doctors do not need more “AI answers” — they need fewer open loops. CONF 88%

That claim ages well because the friction it describes is structural, not technological. A 7,260-physician Kaiser Permanente deployment of ambient AI scribes between October 2023 and December 2024 saved roughly 15,700 hours of documentation time, 84% of surveyed physicians said the AI improved their ability to connect with patients, and 82% reported greater job satisfaction.23 ACC 98% The clearest win was relief of cognitive and administrative load, not faster diagnosis.

The list of seven questions the brief said a doctor-native OS should answer — what matters about this patient now? what am I missing? what admin can be safely automated? what needs follow-up? what changed since I last saw them? what should be documented? which patients are silently deteriorating? where are my outcomes worse than expected? — is essentially the union of features that incumbents (Epic, Oracle Health) and startups (Heidi, Lyrebird, Abridge) are racing to ship. CONF 85% Epic publicly demonstrated three named assistants at its August 2025 UGM (Art for clinicians, Emmie for patients via MyChart, Penny for revenue cycle), and Oracle Health debuted a voice-first “Clinical Digital Assistant” with embedded agentic AI in late 2025.112 ACC 92%

3. Outcome intelligence remains the moat — and the bottleneck

The brief argued outcomes feedback is the real moat. That conclusion holds, but the bottleneck is concrete and shrinking on a definite timeline.

In Australia, the Australian Digital Health Agency reports that 75% of the planned actions in the National Healthcare Interoperability Plan 2023–2028 are now complete, with Services Australia commencing FHIR adoption discovery and implementation planned for 2025–26. The 2026 Federal Budget added a A$13.3 million two-year investment in Sparked, the national FHIR accelerator programme.4513 ACC 95%

This is the prerequisite for outcomes intelligence: longitudinal, identifier-linked, standards-based clinical exhaust that an AI layer can convert into disease-progression signals, missed-diagnosis signals, referral-loop closure, medication safety events, admission/readmission indicators, patient-reported outcomes, and doctor-specific feedback. CONF 65% Without it, “your outcomes are worse than expected” remains a dangerous claim because attribution is messy — adherence, social determinants, specialist care, family support, and cost all confound a single doctor’s contribution. The brief’s warning that the OS must avoid simplistic scorecards and instead use probabilistic signals, benchmarks, confidence intervals, and case-mix adjustment is the right discipline. CONF 90%

4. Architecture: the hybrid answer is now the consensus

The brief proposed a hybrid: local clinical context + cloud-scale reasoning + strict auditability. By mid-2026 that is no longer a contrarian position — it is the dominant architectural pattern in both vendor reference designs and regulator guidance. CONF 85%

The Australian Signals Directorate’s ACSC guidance on AI data security explicitly flags privacy breaches, unauthorised access, third-party dependencies, insecure integrations, and misconfigured wrappers as the primary risk vectors for sensitive data in AI deployments, and recommends controls across all phases of the deployment lifecycle.6 ACC 96% The brief’s table mapping which layers belong where (patient-identifiable context: local; generic medical reasoning: cloud; audit logs: immutable cloud or private cloud; emergency fallback: offline) is consistent with that guidance and with the EU AI Act’s expectations for high-risk AI systems embedded in regulated medical devices.78 ACC 88%

5. Visible AI governance becomes a product feature, on a known schedule

The brief predicted that within five years, doctors and clinics will demand a visible AI control panel: model used, data accessed, confidence/uncertainty, source citations, hallucination-risk warnings, doctor approval state, medico-legal audit log, consent state, escalation rules, and clinical safety versioning. CONF 88%

That prediction has now been overtaken by regulation in a way the brief did not fully spell out. Three concrete deadlines are worth pinning down:

The implication: by mid-2027, any vendor selling AI into clinical workflows in Australia, the US, or the EU will need to operate as if a regulator could ask for the audit trail, the model card, the change-control plan, and the consent state — tomorrow. CONF 85% Boring really does win in healthcare.

6. The hallucination problem is bounded, not solved

The brief listed hallucination into the record as an existential risk. The 2025–2026 literature confirms it is real and quantifiable, but also tractable when scoped narrowly.

A peer-reviewed framework published in npj Digital Medicine in 2025 evaluated LLMs for medical text summarisation and found a baseline hallucination rate of 1.47% across summaries, with 44% of hallucinations classified as major (capable of impacting diagnosis or management if left uncorrected). Fabrications accounted for 43% of hallucinations, negations 30%, contextual errors 17%, and causality-related errors 10%; major hallucinations concentrated in the Plan (21%), Assessment (10.5%), and Symptoms (5.2%) sections of clinical notes.16 ACC 97% A separate multi-model assurance analysis (Nature Communications Medicine, 2025) found that leading LLMs repeated or elaborated on planted adversarial errors in 50–82% of cases, with a simple mitigation prompt lowering the rate from ~66% to ~44%.17 ACC 96%

The honest reading: ambient documentation under doctor review is now safe enough for production. Autonomous clinical reasoning without doctor review is not. CONF 85% The architectural answer the brief proposed — retrieve the minimum required facts, reason with bounded context, return explainable output, log everything, and keep the doctor in control — is precisely the discipline that makes the residual hallucination rate clinically manageable.

7. Specialty rank-order ages well, with one caveat

The brief’s ranking — primary care first, then radiology/pathology, then dermatology, oncology, psychiatry, and emergency — is consistent with where revenue, evidence, and adoption are accumulating in 2026. CONF 80%

Caveat: the brief slightly under-rates revenue-cycle and prior-authorisation use cases. By early 2026, ~70% of US health plans were prioritising agentic AI in utilisation management and prior authorisation, with Cohere Health processing over 12 million authorisation requests annually and end-to-end prior-auth sequences completing in under 10 minutes.2223 ACC 90% In Australia the equivalent wedge is PBS authority forms, eligibility checks, and Medicare item-number selection — precisely the workflow that Medflow already addresses. CONF 80%

8. The time-scaled roadmap (6, 12, 24, 36 months)

The brief presented a year-by-year five-year roadmap. Compressing it into the four near-term horizons changes the picture in instructive ways: most of what the brief assigned to “Year 1” is already shipping in May 2026 and will be table stakes within six months. The harder transitions sit at the 12- and 24-month marks.

6 months — through November 2026 Already happening

What is real today

Ambient AI scribes have crossed the chasm into mainstream primary care. Kaiser Permanente has logged over 2.5 million ambient scribe encounters and reported 15,700–16,000 hours saved across 7,260 physicians; the AMA reports comparable findings in Permanente Medical Group deployments.2324 ACC 96% In Australia, Heidi claims 50% GP usage and Lyrebird is building an evidence layer (MedLuma) on top of Medcast's knowledge base.19 ACC 90%

What we expect by November 2026

12 months — through May 2027 Contextual assistant emerges

The shift

The market moves from “write my note” to “understand my patient.” The deciding factor is whether vendors can retrieve and reason over the patient’s longitudinal record without breaching the architectural and regulatory constraints set out above. CONF 80%

What we expect

24 months — through May 2028 Workflow orchestration

The shift

The OS starts doing multi-step work end-to-end, not just suggesting it. The brief’s “Year 3” examples — book repeat HbA1c in 3 months; draft endocrinology referral; message patient about statin discussion; add recall; check eligibility for care plan; prepare next visit agenda — become standard agentic capabilities by mid-2028 in primary care and high-volume specialty practice. CONF 70%

What we expect

36 months — through May 2029 Outcome intelligence becomes real

The shift

By 2029 the brief’s “Year 4 / Year 5” vision starts to be realised in practices that committed early to the architecture: clinical context, ambient documentation, decision support, workflow automation, billing, secure communication, local memory, outcome analytics, model governance, and consent/audit working as a single command centre. CONF 60% Even the leaders will not have all of it; the question is whether the platform stitches them together.

What we expect

9. Doctor-specific AI is now a real product gap

The brief argued that doctor-specific personalisation — preferred note style, risk tolerance, specialty interest, prescribing habits, referral networks, common patient population, billing patterns, follow-up discipline, inbox behaviour, “things this doctor often forgets,” “things this doctor cares about” — is underrated. That remains true. CONF 85%

The Kaiser Permanente deployment showed that adoption was highest in documentation-heavy specialties and that satisfaction was strongly mediated by how well the AI fit individual workflow.2 ACC 90% Generic guideline recitation does not change behaviour; “you usually manage similar patients this way; here’s what differs today” does. The product frontier through 2027–2028 is to build that personal layer without it becoming an invisible behavioural manipulator. CONF 70%

10. Strategic risks at each horizon

HorizonDominant riskWhy it matters
6 months Commoditisation of the ambient scribe Margins compress fast once Heidi, Lyrebird, and Microsoft DAX all do “good enough” transcription in Australian English. CONF 80%
12 months Hallucination into the record Move from documentation to contextual reasoning increases the surface area for major hallucinations (Plan/Assessment sections most affected).1617 CONF 85%
24 months Regulatory non-compliance EU AI Act, FDA PCCP, and TGA enforcement converge. Vendors without lifecycle risk management, audit trail, and PCCP-style change control are exposed.7914 CONF 85%
36 months Outcome-data overclaim Doctor / practice benchmarking shipped without case-mix discipline will trigger backlash, regulator interest, and indemnity issues. CONF 80%

11. The Medflow-shaped opportunity, restated

The brief framed the opportunity as the AI operating layer for private medical practice. The four-horizon view sharpens the sequencing.

  1. 6 months: ship/strengthen ambient capture and structured consult write-back into the Medflow Clinic app. Stake out the integration story with Best Practice / MedicalDirector / Genie / Halaxy. CONF 80%
  2. 12 months: own the PBS authority workflow end-to-end, including eligibility checks and pre-filled forms. PBS form criteria are publicly addressable, change on a predictable monthly schedule, and represent a defensible Australian-specific moat that horizontal global scribes will not prioritise. ACC 100% CONF 80%
  3. 24 months: introduce recall safety and panel-level care-gap dashboards for chronic disease (T2DM, CKD, CVD, COPD, mental health). Quietly accumulate outcomes data with strict consent and de-identification. CONF 65%
  4. 36 months: the Medflow Metrics product matures into a de-identified practice-intelligence network that benchmarks at the indication level — not the doctor level. This is the real moat, and the brief is right that it is the bit that becomes hard to copy. CONF 55%

Private practice has the structural conditions the brief identified — fragmented systems, admin pain, billing complexity, poor outcome visibility, high doctor autonomy, willingness to pay if workflow pain drops, less enterprise procurement hell than hospitals — and those conditions are not going away inside the three-year horizon. CONF 85%

12. Conclusion

The original brief’s strongest line — the winner is not the best model; the winner is the product that owns the clinical feedback loop — remains the correct strategic frame. CONF 88% What the time-compressed view adds is a sense of when each piece of that loop becomes commercially viable, and what regulatory or interoperability dependency it sits on.

The next 6 months are about admin pain (already largely solved technically, still being adopted). The next 12 months are about contextual reasoning (technically real, regulatory burden rising). The next 24 months are about agentic orchestration (technically plausible, the verifier is the load-bearing component). The next 36 months are about outcome intelligence (architecturally hard, politically harder, and the real moat for whoever builds it humbly).

The final restatement from the brief still holds:

A secure clinical cockpit that knows the patient, knows the doctor, knows the workflow, tracks outcomes, automates safe admin, highlights risk, and leaves the final clinical judgment visibly with the doctor.

The thesis is not “AI will replace doctors.” It is: doctors using outcome-aware AI operating systems will outperform doctors using passive record systems. CONF 88%

References

  1. Epic vs. Oracle Cerner: Comparing New AI Tools in Healthcare IT (Oct 2025). mhhealthcare.com
  2. Ambient Artificial Intelligence Scribes: Learnings after 1 Year and over 2.5 Million Uses. NEJM Catalyst (2025). catalyst.nejm.org
  3. Ambient AI Scribes: Kaiser Permanente’s 7,000-Physician Study. Future Medicine AI (2025). fmai-hub.com
  4. Australian Digital Health Agency notes progress in delivery of national healthcare interoperability plan (HTN, 5 May 2026). htn.co.uk
  5. National Healthcare Interoperability Plan 2023–2028. Australian Digital Health Agency. digitalhealth.gov.au
  6. AI data security. Australian Cyber Security Centre (cyber.gov.au). cyber.gov.au
  7. EU AI Act for Medical Devices: SaMD Compliance Deadlines & Requirements. MDx CRO. mdxcro.com
  8. AI Act & AI-Enabled Medical Devices: Regulatory Status 2026. DQS Global. dqsglobal.com
  9. Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions. US FDA. fda.gov
  10. Small Change: FDA’s Final Predetermined Change Control Plan (PCCP) Guidance. FDA Law Blog (Feb 2025). thefdalawblog.com
  11. AI in Radiology: 2025 Trends, FDA Approvals & Adoption. IntuitionLabs. intuitionlabs.ai
  12. Oracle Health debuts AI-powered EHR designed as a ‘voice-first’ solution embedded with agentic AI. Fierce Healthcare. fiercehealthcare.com
  13. Australian budget looks to advance interoperability and promote sharing of health data (HTN, 13 May 2026). htn.co.uk
  14. Artificial intelligence (AI) and medical device software regulation. Therapeutic Goods Administration (TGA). tga.gov.au
  15. Same rules, smarter tools: what the TGA’s AI guidance means for AI and medical devices. Allens (April 2026). allens.com.au
  16. A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. npj Digital Medicine (2025). nature.com
  17. Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support. Communications Medicine (2025). nature.com
  18. How many GPs are actually using AI scribes? Medical Republic. medicalrepublic.com.au
  19. Heidi is coming for everyone, and why wouldn’t they? Medical Republic. medicalrepublic.com.au
  20. What Australian GPs should consider before adopting AI scribes. Healthcare IT News. healthcareitnews.com
  21. Artificial Intelligence in Clinical Medicine: Challenges Across Specialties. PMC review. pmc.ncbi.nlm.nih.gov
  22. Why Healthcare’s Next Leap Depends on Agentic Systems That Can Actually Do the Work. HIT Consultant (Feb 2026). hitconsultant.net
  23. Transform healthcare prior authorization with AI Agents. AWS Industries Blog. aws.amazon.com
  24. AI scribes save 15,000 hours — and restore the human side of medicine. American Medical Association. ama-assn.org
  25. RACGP fact-sheet: Artificial intelligence (AI) scribes. racgp.org.au