Overview

This curriculum provides a comprehensive exploration of prompt hacking, covering both offensive and defensive perspectives on Large Language Models (LLMs), Vision-Language Models (VLMs), Multimodal Large Language Models (MLLMs), and LLM-integrated robotics.

We delve into the theoretical foundations behind prompt hacking, breaking down how and why LLMs are vulnerable to adversarial inputs. Through mathematical modeling, we formalize key attack strategies, providing insights into prompt injection, prompt leaking, data poisoning, and backdoor attacks.

For practical application, we include code implementations that demonstrate attack techniques and defensive strategies in real-world scenarios. We analyze case studies from recent research papers, offering a data-driven perspective on emerging threats and their mitigation.

Finally, this curriculum presents robust defense mechanisms, equipping AI developers, researchers, and security professionals with the tools needed to fortify LLMs against adversarial manipulation. From reinforcement learning-based safeguards to multi-modal security frameworks, we cover state-of-the-art techniques for ensuring AI safety, reliability, and ethical deployment.

The object of this course is simple “Together Foster Making Safe and Ethical AI”

1. Introduction to Prompt Hacking

1.1 What is Prompt Hacking?

Definition

Prompt hacking refers to techniques used to manipulate or exploit language models (LLMs), vision-language models (LVLMs), multimodal large language models (MLLMs), and LLM-integrated robotics by influencing their text, vision, speech, or action-based processing through crafted inputs. These techniques can:

Bypass safety restrictions to generate harmful or unauthorized responses (jailbreaking)
Override system instructions through adversarial input sequences (prompt injection)
Extract confidential system data by tricking the model into revealing hidden instructions (prompt leaking)
Modify AI behavior during training to introduce hidden vulnerabilities (data poisoning & backdoor attacks)

Scope of Prompt Hacking

Unlike traditional adversarial AI attacks that rely on model perturbations, prompt hacking works at the input level. It does not modify model weights directly but exploits the way models process text, images, and instructions.

Overview

1. Introduction to Prompt Hacking

1.1 What is Prompt Hacking?

Definition

Scope of Prompt Hacking

1.2 Why is Prompt Hacking a Security Threat?