Last Tuesday, Microsoft issued a patch for a serious vulnerability in its M365 Copilot AI platform, which it classified as critically severe. This vulnerability, uncovered by researchers prior to the patch, enabled potential attackers to access two-factor authentication (2FA) codes and other sensitive information from emails that the Copilot could access.

The core issue lies in the AI's inability to effectively distinguish between genuine instructions from users and harmful requests hidden within third-party content. This flaw presents a significant challenge for Microsoft and similar large language model (LLM) providers, as they struggle to create safeguards against these types of malicious exploits. With no effective method to secure the critical boundary between user input and external content, companies are forced to implement complex and often makeshift protections to mitigate the risks associated with this vulnerability.

Among the various safeguards, Copilot and many other LLMs have restrictions that prevent them from executing web forms or sending emails, actions that could be used to extract user data. However, hackers have found ways to bypass these guardrails by using markup language, which allows for the addition of formatting elements to text without needing traditional HTML tags. Another tactic involves embedding sensitive information within HTML tags like <img> or <form>. As a result, when a web request containing this data is made, it is directed to the attacker’s server, where the information can be logged and captured. This ongoing issue highlights the challenges of securing AI systems against exploitation, emphasizing the need for robust protective measures in AI technology.