When AI Hallucinates: Deloitte’s $290K Refund After Faulty Report

In mid-2025, Deloitte delivered a government-commissioned assurance report in Australia that was later found to contain AI-generated facts. The report was intended to review the Targeted Compliance Framework (TCF) — a system that automatically enforces penalties in the country’s welfare program.
Soon after publication, researchers and academics began to spot alarming issues:
- The report cited nonexistent academic papers and sources that could not be verified.
- A quote attributed to a federal court judgment was found to be fabricated.
- Numerous footnotes and citations were incorrect or misattributed.
- In total, 20+ errors were flagged by Dr. Christopher Rudge from the University of Sydney.
- These defects raised serious doubts about the credibility of the report. The errors were not trivial—they undermined the paper’s foundation of evidence and trust.
These defects raised serious doubts about the credibility of the report. The errors were not trivial—they undermined the paper’s foundation of evidence and trust.
Root Cause
Investigations into the flawed report revealed an underlying cause: Deloitte’s use of generative AI in parts of the methodology.
In the revised version of the report, Deloitte publicly disclosed that it had used Azure OpenAI GPT-4, licensed by the Department of Employment and Workplace Relations (DEWR), and hosted under DEWR’s Azure infrastructure.
- The disclosure came only after the errors were exposed; the original version did not clearly acknowledge the use of AI.
- While Deloitte did not explicitly attribute the errors to AI, the pattern of “hallucinations” (plausible but false content generated by large language models) corresponds with known risks in generative AI.
- Deloitte contends that human experts reviewed and validated the outputs before finalization—but the existence of such inaccuracies suggests that oversight failed or was insufficient.
- In short, AI was used as a drafting tool, but the safeguards and checks failed to catch serious factual and sourcing errors.
Deloitte’s Response
Once the errors became public, Deloitte and the Australian government moved to salvage the situation.
Revised Report
- In October 2025, Deloitte published a corrected version of the report. The updated version removed the fabricated judicial quote, eliminated bogus references, and corrected footnotes.
- The revised document also included, for the first time, a disclosure of the generative AI methodology used (Azure GPT-4o) in an appendix.
- The government department (DEWR) maintained that despite the corrections, the core findings and recommendations of the report remained unchanged.
Partial Refund / Repayment
- Deloitte agreed to repay the final instalment of its contract, effectively issuing a partial refund of $440,000 (Australian dollars) or $290,000 as the total payment.
- This repayment was positioned as a gesture of accountability for the faulty deliverables.
- The department confirmed the refund would be processed and that Deloitte’s corrections would be published.
Stance and Criticism
- Deloitte did not openly attribute the mistakes to AI. Instead, it insisted that the substantive conclusions of the report remained valid and that the errors were limited to references and footnotes.
- Political responses were strong. Senator Barbara Pocock called for a full refund, accusing Deloitte of misusing AI, fabricating references, and misquoting a judge.
- Senator Deborah O’Neill labelled the episode a "human intelligence problem" and urged greater scrutiny over the use of AI in consulting.
In sum, Deloitte acknowledged fault, made corrections, and returned part of the money—but also attempted to preserve its reputation by affirming that the report’s substantive value remained unaffected.
Conclusion
This case serves as a reminder that AI is just a tool, not a replacement for human judgment, especially where accuracy, trust, and responsibility truly matter.
The Deloitte case is expected to have a significant impact across the tech industry. Big-tech companies will begin to restrict the use of AI within their systems. They’ll rely more on private, customized AI models — closed, secure, and closely monitored, instead of public tools like ChatGPT, Grok, or Gemini.
Many firms will build their own AI versions to avoid “hallucinations” and make every result traceable. These systems will utilize verified data, detailed usage logs, and human verification checks before any content goes live.
For software engineers, this shift means fewer open experiments and more rule-based AI workflows. Developers will still utilize AI, but under stricter controls that prioritize accuracy and responsibility over creativity or speed.