How long should a legal AI evaluation take?

Two weeks is usually enough. Run three query categories — research, drafting, and a jurisdiction edge case — through each finalist with the same prompts, then compare citation fidelity side by side. If a vendor needs six months of pilot to show value, that is a signal, not a feature.

Is a published hallucination rate required?

For any tool that generates legal content, yes. A vendor unwilling to publish a measured rate with sample size and confidence interval is asking you to trust a marketing adjective with your Rule 1.1 exposure. Ask the question, read the methodology, and make your own decision.

What is a subprocessor, and why does it matter?

A subprocessor is a third party that processes your data on behalf of your vendor. For legal AI, the most important subprocessor question is which external model providers see your prompts. Outside-counsel guidelines, financial clients, and regulated enterprises increasingly ask for a specific list.

What happens to my firm data if I cancel?

A serious vendor will export your data on request in a standard format and confirm deletion from active systems within a defined window, with retention of backup copies for a stated additional period. If the contract is silent on any of those points, raise it before you sign.

Legal AI Buyer's Guide: 10 Questions Before You Sign

Every legal AI buyer’s guide I have sat on the other side of looks roughly the same. The vendor demo opens with a flashy drafting scene. An associate types a question, the screen fills with a polished memo, citations cascade down the sidebar. The committee nods. The pitch never quite gets to the questions that matter. Six months later, the partners are asking why the pilot stalled.

This is a legal AI buyer’s guide written from the other direction. Ten questions. Each one picked because the answer separates a production-ready platform from a pitch deck. Use them in order. If a vendor skips one, the skip is the answer.

1. Who owns the inference infrastructure?

The single most load-bearing question in a legal AI evaluation. When your associate types a privileged prompt — a discussion of a live matter, a draft motion, a clause from a client’s MSA — where does that prompt physically go? Who runs the hardware that processes it? Whose hands does it pass through?

A good answer is specific: the vendor owns and operates its own inference stack, and can name the boundary your data crosses and the boundary it does not. A red flag is any answer that routes inference through an external AI provider, whether that provider is named on the security page or buried in a subprocessor list. Aewita’s answer: we built it, we host it, the data path terminates inside Aewita. No calls to OpenAI, Anthropic, or Google.

2. What is your published hallucination rate, and how did you measure it?

For any product that generates legal content, the hallucination rate is the gating number for Rule 1.1 competence. A vendor should be able to tell you three things without consulting marketing: the measured rate, the sample size, and the confidence interval method. If the answer is “less than 5%” with no denominator, you do not have a measurement. You have a talking point.

Aewita’s number: in internal testing, Aewita observed zero hallucinated outputs across 800 consecutive queries — statistically, a rough upper bound under 0.3% at 95% confidence. The methodology is published, including how we define a hallucination (invented citations, misattributed propositions, and materially misstated holdings all count).

3. Do you train on my client content?

Three answers matter here, and you want to hear all three spelled out. Does the vendor train its foundation models on your firm’s data? Does the vendor train any models, including retrieval or ranking models, on your data without an opt-in? And do any third parties in the data path (see question nine) retain or train on your prompts?

The only acceptable answer to the first question for a law firm is “no,” without footnotes. The second and third should be explicit and defaulted to no. Aewita does not train on client content, and no third-party model provider is in the data path to train on anything.

4. What jurisdictions and years of case law does your system cover?

Coverage is how you catch the vendor who is great at common-law diversity fact patterns and silent on Hawaii trust law. A specific answer has three parts: jurisdictions (federal, all 50 states, D.C., territories as applicable), year range (and the honest earliest year for every court), and document types (cases, statutes, regulations, court rules, administrative decisions).

Aewita covers every U.S. court opinion from 1700 to today, federal statutes, and statutes from all 50 states plus D.C. The system handles 792 document types across 22 practice areas. Ask every vendor for the equivalent numbers.

5. How does citation verification work?

You do not need a diagram. You need an outcome-level answer. A good answer is short: every citation in every output is independently verified against the source document before it reaches the user, and outputs that fail verification are flagged or stripped. Anything vaguer than that, and you are looking at a model that is self-grading its own work — which is exactly how citations that do not exist end up in briefs.

Citation verification is the second most important capability in the product after inference quality. It is also the hardest to fake in a demo — ask the vendor to run your own query and show you a case where verification caught something.

6. What is the pricing model, and what are the cancellation terms?

Three sub-questions under this one, and you want explicit answers to all three. Is there a seat minimum? Is there an annual commit? What are the cancellation terms, including mid-term exit and renewal?

A simple structure is a tell. Aewita is $99 per month or $720 per year, no seat minimum, no annual commit required, 14-day free trial, cancel anytime. An enterprise legal AI product that requires a 25-seat minimum and a three-year commitment may still be the right fit for a 400-attorney firm, but you should know that going in — not discover it in month four of a sales cycle.

7. What DMS integrations do you ship?

For firms that run on iManage, NetDocuments, or SharePoint, the DMS integration is a go/no-go item. For solos and small firms without a DMS, it is a future-proofing question. The right question is not just “do you integrate” but “how” — and specifically, whether the integration is a proprietary connector the vendor has to rebuild every quarter, or a standards-based integration that scales.

Aewita integrates over the Model Context Protocol (MCP), which is an open standard for wiring AI to external systems. That means a firm’s DMS, internal knowledge base, or practice-management tool can connect through a common interface instead of a one-off build.

8. What is the audit-log export path?

Every regulated industry evaluation eventually runs into this question, and the firms that cannot answer it lose deals. Who can export an audit log of every prompt, every output, every user action? On what cadence? Into what format? And what is the retention policy on the logs themselves?

A good answer is prose, not slideware. Named roles (who can pull logs). A named export mechanism (a secure API, a scheduled archive). A stated retention window that is long enough for your insurance carrier and short enough that you know when data is truly gone. If a vendor is vague here, you will not enjoy your first internal audit.

9. What subprocessors are in the data path?

The most important question in this entire list for privilege. A subprocessor is any third party that processes your data on behalf of your vendor: cloud infrastructure providers, observability tools, customer-support platforms, and — critically — any external AI model providers. The vendor should publish a current subprocessor list and commit to notifying you before adding new ones.

For legal AI specifically, the single most important sub-question is: are any external LLM providers in the data path? A “yes” is not disqualifying, but it changes the conversation with every client who has outside-counsel guidelines. Many large financial and regulated-enterprise clients now require firms to disclose third-party AI providers before privileged work is run through them. Aewita’s architecture has no third-party LLM provider in the data path, which simplifies that conversation to a sentence.

10. What happens to my data if I cancel?

The last question, and it is the one most firms forget to ask until they are trying to leave. On cancellation: how long do you have to export your data? In what format? When is active data deleted? When are backups deleted? Who certifies deletion, and how?

A serious answer is a defined export window (usually 30–90 days), a standard export format (not a proprietary dump), a deletion timeline from active systems (days, not months), and a separate timeline for backups (typically measured in weeks, sometimes months for regulated retention). If any of those are unspecified in the contract, mark it up before you sign.

If your vendor won’t answer a question, that’s the answer.

How to run a two-week legal AI evaluation

The worst legal AI evaluations are six-month pilots that never reach a decision. The best are two weeks, structured, and over. Here is the framework I give firms that ask.

Week one: three query categories, same prompts, side by side. Pick three representative queries. One pure research (find the rule, find the controlling case). One drafting (write a specific brief section or a specific clause). One jurisdiction or practice-area edge case — something you already know the right answer to, so you can grade ground truth. Run all three through every finalist with identical prompts. Save the outputs.

Week two: citation fidelity and workflow fit. Open every citation in every output. Check that the cited case exists, that the cited passage says what the output claims it says, and that the holding is correctly characterized. This is where measured hallucination rates stop being theoretical. Then have two attorneys actually use each product for a full day of work. Note friction.

At the end of two weeks, the choice is usually clear. If it is not, the products are genuinely close and you should pick on pricing and subprocessor posture. That is a good problem to have.

The questions vendors most commonly dodge

After sitting through dozens of these evaluations from both sides of the table, a pattern emerges. Three questions draw the most creative evasions. Watch for them.

The subprocessor question (number nine) is the most-dodged in the list. The common move is to pivot from the architecture to the contract — “we have an enterprise agreement with our AI provider that prohibits training and restricts retention.” That is a real thing, and it is not nothing, but it is not an answer to the question. The question was who is in the data path. Privilege is about who touches the data, not what contract is signed on the way. If you wanted a contractual answer, you would have asked about DPAs.

The hallucination-rate question (number two) gets answered with customer testimonials. A partner at a reference firm talks about how much time the product saves and how many citations checked out. That is a useful data point about adoption. It is not a measured error rate. If the vendor pivots to testimonials when you ask for a number, the number does not exist.

The cancellation-terms question (number six) gets answered in the affirmative with a silent fine print. “Of course you can cancel — our standard contract has a termination-for-convenience clause.” Then the clause requires 180 days notice, pro-rated fees through the remainder of the term, and a separate discussion of data export. A cancellation term is not useful if exercising it is harder than just paying the rest of the contract.

None of this is sinister. It is how enterprise software has always been sold. It is also the reason so many legal AI pilots stall: the questions that matter most are the ones the sales motion is optimized to defer.

Why the ten questions matter now

The American Bar Association’s Model Rules of Professional Conduct remain the baseline — Rules 1.1 (competence), 1.6 (confidentiality), and 5.3 (nonlawyer assistance) all apply to AI tools a firm adopts. Aewita’s compliance with those three rules is designed into the architecture, not bolted on after the fact. But compliance by architecture is only a meaningful claim if you, the buyer, ask the questions that expose whether the claim is real. Outside-counsel guidelines from large clients are increasingly specific about AI subprocessors, and the firms that can answer those guidelines cleanly are the ones that keep the work.

Every vendor in this category — including ours — should be able to sit down across from a partner, an IT lead, and a procurement team and answer the ten questions above without a delay, a callback, or a hedge. The ones who can are the ones worth piloting. The ones who cannot are the ones whose case law will eventually turn up a fabricated citation in someone’s brief, and that someone will not be a vendor employee.

If you want Aewita’s written answers to all ten, they live on the security page. If you want to compare the answers side by side with our main competitors, we publish that too. And if you want to skip the paperwork and just run your hardest query, the 14-day trial is a real trial, with real product access. Bring the questions with you.

Legal AI buyer's guide: 10 questions before you sign.