williammayes.com

§5 — Writing

Notes & Essays

On AI evaluation, causal inference, and doing rigorous data science in settings where it actually matters.

What good LLM evaluation actually looks like

Lessons from building evaluation pipelines in a regulated healthcare setting — what vibes-based testing misses and why it matters.

Causal inference for people who've done psychology experiments

A bridge for researchers moving into data science: what transfers, what doesn't, and where the real differences lie.

The asymmetric cost of false negatives in safeguarding ML

Why standard accuracy metrics are the wrong frame for high-stakes classification, and how to think about it instead.