Snorkel AI
AI data development platform for enterprise model fine-tuning, evaluation, and curation.
Snorkel's feed is research thought-leadership; product releases don't surface here.
◆Recent moves
- 3d ago
Agents’ Last Exam: AI Benchmarking for Real Work
Reading-group recap of Agents' Last Exam, a benchmark for long-horizon real-work agent tasks built with Berkeley RDI. Research content underscoring Snorkel's evaluation focus, not a product change.
View source ↗ - 3d ago
Continual learning and evaluating how AI agents learn across sequences of tasks
Explainer on continual learning and evaluating agents across task sequences. Thought-leadership content extending the agent-evaluation theme, not a release.
View source ↗ - 8d ago
Benchtalks #3: We taught AI everything except how to learn
Benchtalks interview on continual learning with a Berkeley PhD researcher. Podcast-style content, not a product change.
View source ↗ - 10d ago
Agentic AI evaluation: Closing the gap with better benchmarks and data
Recap of CEO Alex Ratner's talk on the agent evaluation gap. Conference content reinforcing positioning, not a release.
View source ↗ - 15d ago
JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment
Reading-group recap of JudgmentBench, comparing rubric vs preference evaluation for legal AI quality. Research content, not a product change.
View source ↗ - 17d ago
The Art and Science of Building AI Benchmarks That Shape the Field
Recap of a talk on building field-shaping AI benchmarks. Thought-leadership content, not a release.
View source ↗