OpenAI’s new tool, Sora, marks a real turning point. Unlike earlier AI tools that mostly generated text or supported tasks, Sora can create full videos from a simple prompt. You can drop yourself, or anyone, into a video with a likeness, voice, and identity that feels real.
That’s where things get slippery. What looks real might not be real at all. When your name, your face, and your voice can be replicated in seconds, the stakes rise for both individuals and organizations.
From Tool to Media
Up until now, AI has been mostly about assistance. It helped with drafting, research, and analysis. Sora and tools like it change the equation. AI is no longer just a behind-the-scenes helper. It is creating the actual media we consume and shaping what people see and believe.
The Psychological Toll
More people are beginning to experience what experts call AI psychosis—the feeling of blurred reality when synthetic content gets too close to believable. When the mind can’t separate truth from fabrication, it fills in the gaps. That’s dangerous territory, especially in professions that rely on evidence, trust, and precision.
Executive Summary: In an AI-first future, best practices for billing guidelines demand a dynamic, AI-ready information architecture built on four key principles: 1) Clarity with explicit definitions and hard thresholds; 2) Ease of Reference with logical, scannable structures; 3) Logic with consistent decision frameworks; and 4) Decision Authority for AI to interpret and enforce billing guidelines automatically.
Outside Counsel Billing Guidelines (OCBGs) are the cornerstone of legal spend control, providing enforceable rules that govern how external law firms bill for services. Yet they are often the weakest link in the compliance chain. When OCBGs are missing, lack clarity, or scattered across documents, invoice review processes break down.
According to Bloomberg, 72% of in-house legal teams track legal spend, but only 18% measure compliance to their billing guidelines. Why are only one in five organizations applying OCBGs effectively? This article explores the reasons and what Legal Ops can do to extract more value from OCBGs, including how AI can enforce more consistent control.
Three Gaps to Close
For OCBGs to work, they must be accessible, unambiguous, and enforceable. Most fall short on at least one of these fronts, creating a significant gap between OCBG intent versus reality.
Accessibility
The most common challenge is guidelines are hard to find. Rules are scattered across Master Service Agreements (MSAs), letters, emails, and other sources. Reviewers waste hours piecing together applicable rules or relying on outdated systems.
MSAs: Basic rate structures and general terms, but rarely comprehensive billing rules.
Emails: Discussions and agreements are buried in threads and forgotten.
Engagement Letters and Alternate Fee Arrangements (AFAs): Specific modifications based on the matter at hand that tend to contradict norms and overwrite OCBGs.
Side Agreements: Verbal understandings that become informal precedents, captured in short-term memory, expected to be enforced, but often not documented properly.
Rules Engines: The logic-based configurations and hard-coded rules used to automate reviews and power legacy e-billing systems. They are difficult to set-up and maintain.
Each vendor can have different rules, and different matters may have conflicting standards, making consistent enforcement messy.
Ambiguity
Even when an OCBG is expertly prepared, vague language leads to interpretation rather than enforcement. Consider what seems like straightforward language:
Guideline
Real-world example
Why it’s ambiguous
No block billing permitted.
Conference call regarding contract terms and follow-up research (3.5 hours).
Does combining these tasks count as block billing?
Partners should not bill for admin tasks.
Preparing sealed filing (AEO) (1.2 hours).
Is preparing a filing administrative or substantive work?
Only one timekeeper per meeting.
Senior associate and junior associate both attend an internal meeting.
If one is note-taking, does that still violate the rule? What if a specialist joins briefly?
Without explicit definitions, guidelines quickly become judgment calls. Enforcement varies by reviewer, leading to inconsistency and disputes.
Enforcement
In Part 1 of this series, “The Hidden Costs of Manual Invoice Review”, we reviewed why enforcement falls short. Humans apply standards unevenly, while rules engines are too brittle and difficult to update. Both approaches lead to inconsistent outcomes or guidelines being ignored.
The solution lies in AI. Companies that shift from manual to AI reviews, with limited human oversight, can expect more consistent enforcement. Even when billing guidelines are poorly organized, AI can begin to enforce rules more consistently.
And the real power comes when AI draws from best practice, with OCBGs optimized for AI.
An OCBG Framework For AI
Organizations should improve their OCBGs not only as a document, but also an information architecture that enables AI to exercise precise, consistent control.
1. Promote Clarity
AI excels when information is unambiguous and definitive. Unlike humans who can interpret context and fill in gaps, AI requires explicit clarity to make consistent decisions.
Examples of Best Practice:
Create a comprehensive “Definitions” section at the beginning of your OCBGs. Explicitly define terms like:
Block Billing: “Any time entry that describes more than one substantive task or combines activities that occurred at different times”.
Role Classifications: Clear definitions for “Senior Partner,” “Junior Associate,” with corresponding rate caps and authorization requirements.
Include a list of Explicit Prohibitions, a definitive “do not pay:”
Billing for administrative tasks at attorney rates
Block billing as defined
Duplicate charges for identical services or expenses
Time entries from unauthorized timekeepers not pre-approved in writing
Charges exceeding approved rate caps without prior written authorization
Replace vague language with parameters and hard thresholds that enable AI to make firm decisions. For example:
Instead of: “Reasonable research time” (requires human judgment)
Use: “Legal research exceeding 4 hours for any single motion requires pre-approval per Section 3.2.1” (enables automated decision)
The goal is to replace ambiguity with a source of truth.
2. Create Ease of Reference
Although AI can navigate poorly structured documents, a well-structured data set enables AI to work more efficiently with humans. A simple numbering system allows AI to generate citations, supporting explainability and traceability. A citation, such as “Violation of Section 3.1.2 – vague description standards”, will lead to better understanding of the issue than a general reference. This helps to reduce mistakes and facilitate instances when invoices need to be reviewed by a human.
3. Enable Logical Thinking
AI works best when information follows consistent logical patterns. This ensures a more consistent application of an analytical framework across thousands of invoices.
OCBGs should demonstrate explicit and logical guidance:
Invoice Submission Requirements:
All invoices must be submitted within 60 days of service completion.
Late submissions (61-90 days) subject to 10% reduction.
Submissions beyond 90 days will not be paid.
Required Invoice Fields
Matter name & number ensure charges are tied to the correct case or project. These must match your system matter information
Timekeeper name & role identifies who did the work
Hours worked & hourly rate show the effort and cost per person
Task/activity code & description explains what was done, eliminates block billing, and enables spend reports
Itemized expenses break out the costs like travel or filing fees separately
Depositions where attorney presence is mandated by court order.
Pre-approved client meetings in international jurisdictions.
Clarity on Fee Structures:
Capped fees: Standard hourly rates with firm limits on total matter costs
Fixed fees: Flat fees for defined projects or work portfolios
Success fees: Lower base fees combined with performance bonuses tied to specific outcomes
Your OCBG is a procedural guide for the entire vendor relationship. The more logical it is, the more predictable it can be for both sides in relation to invoice reviews.
4. Enable Decision Authority
The real transformation comes when AI can enforce OCBGs independently, escalating only edge cases for human review.
Enable AI decisions:
Principle
What it means
Example
Replace uncertain language
Use absolutes instead of conditionals so AI can enforce without interpretation
Instead of “firms should generally avoid…,” use “The following activities are prohibited and will result in automatic charge rejection”
Include or reference the detail
Provide full references to policies and standards that guide enforcement
Give clear criteria so AI can apply judgment consistently
AI can independently reject minor infractions while escalating material ones
Enable paths for cases that require human judgment:
Principle
What it means
Example
Define escalation triggers
Specify the situations that require human review
Novel scenarios, new expense types, or disputed charges
Establish hierarchy
Clarify who can approve exceptions and under what circumstances
Senior counsel vs. Legal Ops vs. Finance
Create learning feedback
Feed human decisions back into the system to improve AI consistency
Update rules and exceptions based on past escalations
AI Transformation
Integrating OCBGs with AI enables consistent, automated enforcement. There is no speculation, only enforcement. If AI encounters a line item that challenges its interpretation of the OCBGs, it can be escalated. Over time, as clarity improves in the guidelines, AI can reduce the need for human reviews.
The path towards AI transformation provides several advantages:
No Backlogs: AI processes invoices immediately upon receipt.
Consistent Communication: Precise, actionable feedback on every violation.
Reduced Friction: AI removes personal dynamics.
Pattern Detection: AI identifies systemic compliance issues.
Freed Human Resources: Lawyers and Legal Ops professionals can shift focus to more complex work such as vendor management, benchmarking, and other core functions.
The Path Forward
Organizations don’t need perfect OCBGs to realize benefits from AI reviews. However, those willing to invest in optimizing their billing guidelines can achieve substantially better results.
The optimal path forward involves three phases:
Phase 1: Implement AI review for immediate financial gains and improved visibility. Begin with existing guidelines and start capturing the violations manual reviews consistently miss.
Phase 2: Use AI to identify which guidelines need clarification, which rules are consistently violated, and how to optimize the information for AI.
Phase 3: Develop more comprehensive OCBGs to power future AI invoice reviews.
This approach benefits from a feedback loop that enables quick wins as well as long-term improvements. Rather than guessing which rules might work, organizations can observe which guidelines drive compliance and iterate from there.
The Gist of It
Most Outside Counsel Billing Guidelines (OCBGs) fail because they are inaccessible (scattered across emails and systems), ambiguous (written with vague, interpretive language), and unenforceable by legacy rules engines or inconsistent human review. To succeed in an AI-first world, OCBGs must:
Promote clarity with explicit definitions
Provide structured references
Enable logical thinking with clear decision frameworks
Grant the AI decision authority to enforce rules
Organizations can start by applying AI to their existing guidelines to capture immediate benefits, then iterate toward a fully optimized, AI-ready information architecture.
In Part 3:We explore forecasting spend and the potential for more accurate reporting, demonstrating how AI transforms legal spend data from historical accounting into predictive intelligence.
Here’s a quick look at the biggest AI news from the past week. We’ve pulled together the headlines shaping technology, business, and policy.
OpenAI inks chip supply deal with AMD In a multi-year agreement, AMD will supply its MI450 chips to OpenAI, and OpenAI holds an option to acquire up to 10% of AMD. This is a significant move, signaling that OpenAI is diversifying its hardware dependency beyond NVIDIA. [Reuters]
Wall Street rallies on AI optimism U.S. equity markets opened higher, with investor sentiment buoyed by AI infrastructure developments—especially the OpenAI/AMD deal—as AI continues to dominate narratives in tech investing. [Reuters]
California sets new standard for AI transparency law California passed Senate Bill 53, requiring large AI developers to disclose safety and security protocols and report critical incidents within 15 days, along with whistleblower protections. It’s a notable regulatory step pushing toward more public accountability. [Le Monde.fr]
EU advances to reduce AI dependence on U.S. and China Draft proposals are circulating for an “Apply AI strategy” within the EU to boost domestic AI development and reduce strategic reliance on U.S. and Chinese tech. The plan includes funding local AI initiatives and increasing regulatory safeguards. [Financial Times]
Meta will use AI chatbot conversations to refine ad targeting Starting December 2025, Meta will begin using users’ chats with its AI chatbot to better personalize ads and content across Facebook and Instagram without user opt-out in the U.S. [The Wall Street Journal]
AI job displacement warned at scale A Senate Democrat’s report estimates that up to 100 million U.S. jobs could be affected by AI and automation in the coming decade, stoking intensifying debates on policy, reskilling, and social safety nets. [Axios]
AI governance takes global spotlight at U.N. At the 2025 U.N. General Assembly, voices from governments, academia, and civil society amplified the urgency for cross-border AI regulation. The U.N. launched a new “Global Dialogue on AI Governance” to help shape norms and compliance frameworks. [TIME]
Last year the U.S. Federal Trade Commission (FTC) launched Operation AI Comply, a law enforcement sweep targeting companies that misuse or overhype AI in ways that deceive consumers. The aim is to make it clear that the same consumer protection rules still apply in the AI era. Companies cannot make false or misleading claims, engage in unfair practices, or obscure how their technologies actually work. In other words, there is no “AI exemption” because long-standing advertising, privacy, and consumer protection standards remain fully in force.
Here’s a breakdown of what’s going on, why it matters, and what legal teams should do in response.
What the FTC is Cracking Down On
In Operation AI Comply, the FTC has brought actions against several companies that made misleading claims about AI capabilities. Key issues include:
AI tools marketed as a substitution for professional services, such as “AI lawyers” that promise to replace attorneys.
AI-powered systems claiming to help consumers make passive income via online storefronts, but failing to deliver.
Services enabling fake reviews or deceptive content through AI, which mislead consumers.
Some of the named companies in the FTC’s actions are DoNotPay, Ascend Ecom, Rytr, Ecommerce Empire Builders, and FBA Machine. Several of the companies promoted bold claims of earnings and success from their AI tools, but the FTC found those promises fell short in practice.
Why Legal Professionals Should Pay Attention
The implications of Operation AI Comply go well beyond marketing departments. For legal teams and practitioners, the risks are real and immediate:
Misleading claims. Using or endorsing AI tools that overpromise can land a business in hot water with regulators. Regulators may view inflated claims as misrepresentation even if they originate with third-party vendors.
Client expectations and duty of candor. Clients who hear that AI guarantees faster results or perfect accuracy may assume that the company stands behind those claims. If results fall short, questions of competence, misrepresentation, or failure to provide appropriate advice may arise.
Due diligence. Lawyers evaluating AI tools must move beyond marketing decks and promotional copy. Critical questions include: What data trained the system? How was it tested? What limitations were identified? How are errors addressed? Documenting this inquiry helps establish a record of diligence.
Transparency needed. Clients and courts need clarity about AI use. Effective practice means explaining both capabilities and limitations, setting realistic expectations, and ensuring outputs are verified before being presented as fact.
What to Do Now
Here are some practical actions legal departments should consider to respond to these kinds of regulatory pressures:
Action
What it looks like in practice
Audit AI tool claims
Review vendor contracts, product literature, and marketing materials. Do the claims match what the tool actually does?
Set guardrails
Create internal policies about how AI outputs are reviewed, fact-checked, and used in client or court materials.
Train staff
Make sure lawyers and support staff understand AI’s limitations, such as hallucinations and weak citations, so they don’t blindly trust outputs.
Require vendor transparency
Ask AI vendors for evidence of reliability and accuracy. What data did they train on? How often do they update models?
Monitor regulatory trends
Keep an eye on FTC guidance and complaints, state bar ethics opinions, and industry reports. The legal environment around AI is changing fast.
The Gist of It
Operation AI Comply is a strong signal from regulators that hype and marketing about AI won’t excuse inaccurate or misleading claims. For legal teams, the lesson is that any AI tools adopted or promoted must be described accurately, deployed responsibly, and supported with real results. AI tools can indeed deliver efficiencies and insights but only when paired with professional judgment, thorough verification, and attention to ethical, legal, and regulatory risk.
OpenAI recently introduced GDPval which evaluates how well AI performs on real-world, economically valuable tasks across 44 occupations, including legal. GDPval examines the work professionals do every day, from drafting legal briefs to analyzing compliance documents. Nick Whitehouse, Chief Artificial Intelligence Officer at Onit, explains the results and what this means for legal operations.
How GDPval Works
OpenAI designed GDPval by asking seasoned professionals to create 1,320 specialized, real-world tasks. Each task is based on real work that goes beyond a simple text prompt. These tasks come with reference files and context, and the expected deliverables come in the form of documents, slides, diagrams, and spreadsheets. The tasks were then completed by experts in the field and AI models. To evaluate performance, expert graders (a group of experienced professionals from the same occupation represented in the dataset) reviewed both outputs in blind comparisons. The results were scored on quality, accuracy, and usefulness, making GDPval a more realistic test than academic benchmarks because it reflects the actual deliverables professionals produce every day.
What the Results Said
Leading models are already producing work that approaches expert quality in certain tasks. In some cases, AI completed structured tasks up to 100 times faster than human experts. For the legal field, this means AI is proving its ability to draft, summarize, and analyze with surprising efficiency. But GDPval also underscores the limits. Real-world legal work requires judgment, iteration, and client nuance, qualities AI cannot replicate.
How to Utilize the Results
For legal teams, GDPval is a call to use AI as a force multiplier, not a replacement. Practical applications include:
Accelerating contract review and summarization
Enforcing billing guidelines more consistently
Reducing hours spent on initial research
By shifting repetitive or structured work to AI, lawyers and legal ops professionals can focus on higher-value contributions like risk assessment, negotiation, and client strategy. The opportunity lies in designing workflows where human judgment and AI speed complement one another.
The Gist of It
AI is getting closer to expert-level performance on real-world legal tasks, but it is not a substitute for legal expertise. The advantage for legal and legal ops teams is: let AI handle the repetitive, structured work so people can spend more time on strategy, judgment, and client trust. The challenge is not whether AI is capable, but how quickly teams can adapt to use it wisely.
In September’s edition of Third Thursday, Erin Sussman and Jeffrey Solomon joined us to share their key insights on developing strong billing guidelines:
1. Start with industry standards. Leverage trusted industry resources as a baseline to understand what’s considered “market standard.” Partners and advisors can also provide guidance on best practices for communication, rollout, and firm acceptance.
2. Conduct a historical analysis. Look back at past invoices to spot trends and problem areas where guidelines could help contain costs. Common examples include:
Excessive hours billed for legal research
High or poorly documented travel expenses
Overstaffing or disproportionate partner time
3. Engage key in-house legal team members. Engage key members of your legal department when drafting guidelines. Their firsthand insight into your matters and law firms’ billing behaviors will help ensure your policies are both relevant and enforceable. Just as important, their involvement fosters ownership and buy-in, which makes enforcement smoother.
How to take advantage of AI in the bill review process:
4. Combine AI with human insight. AI is powerful at spotting patterns and exceptions, but pairing it with human judgment creates even more flexibility in controlling nuanced or complex legal spend. This frees your team to spend less time policing invoices and more time practicing law.
5. Monitor which rules the AI flags most frequently. Keep track of which rules your AI solution flags most often. Are certain system-applied flags popping up again and again? That trend data can help you refine your guidelines, improve communication with firms, and sharpen your cost-control strategies.
With the right mix of industry benchmarks, internal input, and AI-driven review, billing guidelines become more than rules; they serve as a strategic lever to reduce costs, drive consistency, and strengthen firm relationships.
Presenters
Erin Sussman General Counsel, Director of Business Development and Client Service Sterling Analytics
MIT grabbed headlines recently by reporting that 95 percent of Generative AI pilots never drive measurable results, with only 5 percent of projects achieving real scale. Every stalled pilot drains budget, slows transformation, builds opposition, and piles pressure on legal ops leaders to prove ROI. Below, we examine what lessons from the study can be applied to your legal operations AI strategy:
As Nick Whitehouse, Chief AI Officer at Onit, explained in a recent response to the study: “Are 95% of generative AI projects failing? Nope, not really. The way we are measuring them and the way we are running them, yes, but the technology itself, no.”
Why Generative AI Pilots Fail in Legal Departments
AI that is disconnected from workflow
The MIT study found that the leading reason generative AI pilots fail is weak integration with workflows, not weak technology. If AI doesn’t live inside e-Billing, matter management, or contract review, it lives on the sidelines. Integration reduces the barriers to adoption, and adoption is impact. This is exactly why modern legal ops software that is built to connect workflows matter.
Chasing sizzle over substance
The same research showed that companies spend heavily on high-profile pilots in sales and marketing, while the biggest ROI sits in back-office automation. In legal, it can be tempting to tackle impressive use cases like advice giving and negotiation strategy. However, AI can deliver more value faster for invoice review, spend management, and contract analytics than “showcase” projects.
As Whitehouse noted, “The MIT study was actually helpful: it found the biggest wins often come from focusing on back-office tasks. These are well documented, process-driven, and usually have the data you need to measure results. That’s where success tends to show up first.”
The build versus buy dilemma
The difference between success for both build or buy is focus. Analysts noted that vendor tools succeed about twice as often. Internal builds can work, but only if they are targeted and resourced properly. General purpose AI models still require significant tuning, configuring, custom UX, and most importantly, scientific validation to maximize their value in a particular domain. This requires a domain specific focus and attention.
Tackling too much at once
Spreading resources too thin is another culprit. Startups focusing on one use case often scale quickly, reaching $20 million in revenue in under a year. Larger enterprises tend to pilot too many initiatives at once and struggle to show meaningful progress. Legal ops leaders who pick one workflow, deliver results, and scale from there build momentum and credibility.
As Onit’s Chief AI Officer pointed out, “About 5% of businesses in the study are doing an exceptionally good job using AI to solve real pain points. You see this especially in the startup ecosystem. In fact, the study showed that businesses working with specialized companies or products focused on these problems have twice the success rate compared to those going it alone.”
How to Make Sure Your Legal Ops Project is Among the 5% of Success Stories
Meanwhile, adoption pressure is only growing. A recent survey found that 96 percent of legal professionals already use AI in their daily workflows, and nearly half describe it as essential. Legal cannot afford to treat AI as optional. However, moving from experimentation to execution requires a structured approach.
Here’s how to set generative AI pilots up for success in legal ops:
Pinpoint one high-impact process
Look for workflows where time and cost converge, like invoice review or contract approvals.
Embed AI into the systems your team already use
If it isn’t integrated, it won’t be adopted. Seamless connection to your core legal tools is non-negotiable.
Measure what matters
Cycle-time reductions, outside counsel savings, compliance improvements. These are the metrics leadership cares about, not abstract productivity gains.
Fund what works
Double down on pilots that deliver measurable ROI and resist the urge to scatter budget across experiments.
Lead the adoption curve
Change management is not an afterthought. Equip your team with training, communicate wins, and build trust in the tools.
From Pilot Failure to Legal Ops Success
Loose generative AI pilots are failing at high rates, but that doesn’t mean legal ops is destined for the same fate. Infosys research shows that 95 percent of executives using AI have experienced a mishap, and only 2 percent of organizations meet responsible AI standards. Like the MIT study, that’s an explicit argument for structure, governance, and discipline.
“The right way to look at this is as a maturity journey: moving from ad hoc experiments to fully integrated AI in your core processes. That’s when you’ll start seeing massive success.” – Nick Whitehouse
Legal ops leaders who focus on integration, measure what matters, and scale intentionally will be the ones who prove AI’s real value.
We turned the key lessons from MIT’s AI pilot study into a quick summary slide.