A 30-day kirana store benchmark where AI agents manage stock, cash, trust, khata, marketing, and customer service.
Applied AI lab notes
BeCapable Research
Benchmarks and field systems for AI that has to make decisions in messy, practical environments.
Bench marks
Field data
Decision logs
Replay proof
eval loop
agent traces
public demos
Current Tracks
Tools for making model outputs inspectable: replay screens, evidence logs, scoreboards, and provider response audits.
Applied experiments for education, commerce, operations, and public interest workflows where AI has to survive real constraints.