This curriculum provides a comprehensive exploration of prompt hacking, covering both offensive and defensive perspectives on Large Language Models (LLMs), Vision-Language Models (VLMs), Multimodal Large Language Models (MLLMs), and LLM-integrated robotics.
We delve into the theoretical foundations behind prompt hacking, breaking down how and why LLMs are vulnerable to adversarial inputs. Through mathematical modeling, we formalize key attack strategies, providing insights into prompt injection, prompt leaking, data poisoning, and backdoor attacks.
For practical application, we include code implementations that demonstrate attack techniques and defensive strategies in real-world scenarios. We analyze case studies from recent research papers, offering a data-driven perspective on emerging threats and their mitigation.
Finally, this curriculum presents robust defense mechanisms, equipping AI developers, researchers, and security professionals with the tools needed to fortify LLMs against adversarial manipulation. From reinforcement learning-based safeguards to multi-modal security frameworks, we cover state-of-the-art techniques for ensuring AI safety, reliability, and ethical deployment.
The object of this course is simple “Together Foster Making Safe and Ethical AI”
Prompt hacking refers to techniques used to manipulate or exploit language models (LLMs), vision-language models (LVLMs), multimodal large language models (MLLMs), and LLM-integrated robotics by influencing their text, vision, speech, or action-based processing through crafted inputs. These techniques can:
Unlike traditional adversarial AI attacks that rely on model perturbations, prompt hacking works at the input level. It does not modify model weights directly but exploits the way models process text, images, and instructions.