Applied AI lab notes

BeCapable Research

Benchmarks and field systems for AI that has to make decisions in messy, practical environments.

Open DukaanBench View research tracks

Bench marks

Field data

Decision logs

Replay proof

eval loop agent traces public demos

Current Tracks

DukaanBench

A 30-day kirana store benchmark where AI agents manage stock, cash, trust, khata, marketing, and customer service.

Evaluation Interfaces

Tools for making model outputs inspectable: replay screens, evidence logs, scoreboards, and provider response audits.

Field Systems

Applied experiments for education, commerce, operations, and public interest workflows where AI has to survive real constraints.