Get the Value of a High-Quality Audit, All the Time

We keep asking questions we should already know the answer to. And we usually ask them when a decision depends on it.

Sometimes we guess. Sometimes we do a one-off investigation. Sometimes we shrug and move on.

What if you could get the value of a high-quality audit all the time?

The idea

Instead of running audits occasionally, automate the audit itself.

An audit is just a set of questions.

If you make those questions explicit, and make them answerable repeatedly, the audit stops being a one-off activity and becomes something you can run continuously.

The method

Start from pain points you already feel, or decisions you struggle to make.

From each pain point, write the questions that would help you address it.

Those questions imply dimensions (what you want coverage over) and, over time, a set of entities that describe your world.

Don’t try to be complete. Just describe enough of your world to support the questions you care about.

For example, “we don’t know if we can reproduce the data of our projects” is a pain point that may prompt the questions:

Which projects have code packages?
Which code packages have a README with reproducibility steps?
Which README file instructions actually work?

The first two should be relatively trivial to check (in this GenAI world) and you can decide how much value the third one gets but the important point is that you’ve introduced what you know and surfaced what you currently can answer in case it’s worth it later.

The dimensions in our example are Code Packages and Projects.

These are actionable and describe different states of knowledge and reproducibility readiness.

Note

Other questions will likely arise from these such as “Which projects are ongoing? Which code packages belong to which projects?”

The entities in your world

After writing a few questions, you can jump to a whiteboard and try to describe a lot of the entities in your world to brainstorm about what you own, what you interact with and how things cluster together. The entities are the targets of the questions, and dimensions allow you to define the coverage. Example: “We have a total of 12 projects: 5 have READMEs with reproducibility steps, 2 don’t have READMEs, 5 have no packages.”

It’s up to you whether you want to immediately define actions to take or treat it as a helpful data point for others to decide. The point is you stop spending time re-answering ad hoc questions.

The forcing function

Once you have questions, you write evals, checks that verify you can answer them. (See promptfoo’s documentation for one way to implement them.)

An eval is simply:

If I ask this question, I expect to get this kind of answer.

Writing the eval is where the value appears.

The moment you write it, you’re forced to confront whether you can actually answer the question at all.

If you can, the eval is straightforward.
If you can’t, you’ve discovered something you thought you knew but didn’t.
If the answer is subjective, that gap becomes explicit.

You don’t need perfect answers. You need to know which evals pass and which ones reveal that you never had a real answer.

That alone is valuable.

Note

While writing evals in a form that doesn’t break as things change requires some practice, it’s not your primary concern when starting. It’s fine to be explicit and baseline.

What you get

Once questions have evals, answering them becomes an implementation detail. With MCPs (Model Context Protocol servers), Playwright, etc. Programmatic access is easier than ever, the hard part isn’t answering questions, it’s knowing which ones to ask and systematizing what “good” looks like.

Over time, you start to see where your questions apply, which are too vague, which are easier than expected, and where your description of the world is still thin.

You don’t need to go all in. Start with a few questions and a small report that grows over time.

At that point, you’re no longer redoing one-off audits.

You’re running a continuous audit whose scope is defined by the questions you care about.

Why this is low-risk

You don’t need to know everything upfront.
You don’t need to define “good” everywhere.
You don’t need a complete model of your world.

You just need to acknowledge you may not know the answer for questions you’ve not explicitly tried to answer (quite similar to feeling pain when doing unit testing while being unsure of what you want)

Even unanswered questions are useful. They tell you what’s unclear, subjective, or not worth investing in.

Closing

You don’t need to audit everything.

You just need to stop rediscovering the same answers, and the same unknowns, over and over again.

Start with a few questions. Let the audit grow.

If it doesn’t help, stop.

The cost is small. The insight compounds.