4 results found

Microsoft has launched ASSERT, an open-source framework designed to simplify AI behavior testing. It enables developers to create comprehensive, application-specific evaluations using natural language descriptions, ensuring AI systems act as intended for particular products and services. The tool translates high-level goals into structured tests, generates scenarios, scores results, and logs execution paths.

An intense internal power struggle within the Trump administration has stalled US federal AI regulation, leaving a policy vacuum after Anthropic's Mythos model revealed critical cybersecurity risks. Factions within the Commerce Department, intelligence agencies, and pro-industry groups are locked in a "knife fight" over who gets to evaluate and oversee advanced AI systems. This paralysis follows the abrupt cancellation of a landmark executive order and the unexplained withdrawal of AI testing announcements.

As autonomous AI systems become prevalent, intent-based chaos testing emerges as a critical method to prevent catastrophic failures caused by AI agents acting confidently but incorrectly. This approach addresses the limitations of traditional testing, which fails to account for AI's probabilistic nature and complex interactions. By measuring deviation from an agent's intended behavioral boundaries, this testing methodology helps ensure AI systems operate safely in unpredictable production environments.

Google is currently testing its new Gemini 3.1 Pro AI model against Gemini 3 Pro, focusing on their performance with creative prompts. This evaluation aims to understand how enhancements in Gemini 3.1 Pro might influence its creative output quality, potentially indicating a strategic design choice prioritizing intelligence over raw speed. The results will be crucial for the evolution of Google's advanced AI capabilities in complex generative tasks.