A Field Guide to Agent Evaluations
Jun 18, 2026Practical taxonomy of agent evals: unit tests, integration tests, online evals, and benchmarks.
While AI is often associated with digital applications, its role in hardware and physical environments is just as exciting! To explore AI past software, I built and trained a robotic arm capable of learning and replicating tasks using a machine learning model called an Action Chunking Transformer (ACT). To do this, I 3D printed and wired both a "leader" and a "follower" arm, allowing me to teleoperate the follower and record demonstrations such as placing blocks into different buckets. Using this dataset, I trained the ACT model to learn from my examples, enabling the robotic arm to autonomously perform the task and even generalize to fixing its own mistakes.
LangChain
Member of LangChain's Applied AI team, working on production agents and AI-related features.
Cisco
Technical SME and hands-on developer for GenAI solutions on the MarTech Portfolio & Innovation Team, integrating AI marketing technology across the enterprise stack.
The DTH Media Corp.
Led and trained a 15-rep advertising sales team, designing the commission structure, training program, and new ad products while managing local and national client relationships.
Degree in Economics, Statistics & Information Systems
Practical taxonomy of agent evals: unit tests, integration tests, online evals, and benchmarks.
How the claw agent framework combines messaging, filesystems, and memory to create evolving digital assistants.
Context engineering, context rot, and how LLMs navigate content far exceeding their context window.
The new standard for packaging reusable workflow capabilities for filesystem-based agent harnesses.
Defining, building, and applying LLM evaluations to improve AI products.
Understanding RLVR and creating RL environments for large language models.
LLMs should be given the same tools as humans to interact with the digital world we share.
contact