What are the core concepts and different techniques used in prompt engineering?
Core Concepts of Prompt Engineering
At its heart, a prompt is the textual input, sometimes accompanied by other modalities like images, that a user provides to a generative AI model to guide its output. In the context of LLMs, a prompt can range from simple questions to detailed descriptions or complex problem statements. Modern definitions consider the entire string passed to the LLM as the prompt, in contrast to earlier definitions that might separate the "task description" from the input data.
Prompt engineering is the iterative process of designing high-quality prompts that guide LLMs to produce accurate outputs. It involves tinkering to find the best prompt, optimizing prompt length, and evaluating the prompt's writing style and structure in relation to the task. Effective prompt engineering requires not only crafting the prompt itself but also setting LLM configurations optimally for your task. Key configurations include output length, which can be controlled by setting a maximum token limit, and sampling controls like temperature, top-K, and top-P, which influence the randomness and diversity of the output. Temperature, in particular, affects the creativity level.
The field of prompt engineering is rapidly emerging, and while everyone can write a prompt, crafting the most effective ones can be complicated. It requires understanding the model's capabilities and limitations, domain knowledge, and a methodical, iterative, and exploratory approach, similar to traditional software engineering practices like version control and regression testing. The iterative process involves crafting, testing, analyzing, documenting results, and refining prompts based on the model's performance. Prompt engineering is needed because LLMs, despite being tuned to follow instructions, aren't perfect, and clearer prompts lead to better results. Inadequate prompts can result in ambiguous or inaccurate responses.
Different Prompting Techniques
The sources detail numerous techniques for structuring prompts to achieve better results. These techniques can be broadly categorized:
Basic/Fundamental Techniques:
- General Prompting / Zero-Shot Prompting: This is the simplest type, providing only a description of the task and starting text, with "no examples".
- One-Shot & Few-Shot Prompting: These techniques involve providing examples (exemplars) to the model. A one-shot prompt provides a single example, while a few-shot prompt provides multiple examples. Providing examples is highlighted as a highly effective and important best practice, acting as a teaching tool to help the model understand the desired output structure or pattern. For few-shot prompting, using at least three to five examples is a general rule of thumb, depending on task complexity, example quality, model capabilities, and input length limitations. Examples should be relevant, diverse, high-quality, well-written, and can include edge cases.
- System, Contextual, and Role Prompting: These techniques guide LLMs by focusing on different aspects:
- System Prompting: Sets the overall context and purpose, defining the 'big picture' of the task.
- Contextual Prompting: Provides specific details or background information relevant to the current conversation or task, helping the model tailor its response. It is highly specific to the dynamic, current task.
- Role Prompting: Defines a role perspective for the model, setting the tone, style, and focused expertise desired for the output.
Reasoning and Thought Generation Techniques:
- Chain of Thought (CoT): This technique prompts the LLM to generate a series of intermediate reasoning steps before giving the final answer, mimicking human problem-solving. This can elicit reasoning capabilities in LLMs. It can be used in a zero-shot setting (Zero-Shot CoT). For tasks where there is likely one single correct answer, setting the model temperature to 0 is recommended for CoT.
- Step-Back Prompting: A modification of CoT, it prompts the LLM to first consider a general question related to the task and then uses the answer to that general question in a subsequent prompt for the specific task. This helps the model activate relevant background knowledge.
- Self-Consistency: This technique combines sampling multiple reasoning paths and uses majority voting to select the most consistent answer. It can improve the accuracy and coherence of responses and is used with CoT.
- Tree of Thoughts (ToT): Drawing inspiration from human cognitive processes, ToT involves a multi-faceted exploration of problem-solving pathways, creating a tree-like search space where the model generates and evaluates multiple steps or "thoughts". It's effective for tasks requiring search and planning.
- Other related techniques include Recursion-of-Thought, Plan-and-Solve Prompting, Skeleton-of-Thought, Metacognitive Prompting, Mixture of Reasoning Experts (MoRE), and Cumulative Reasoning.
Decomposition Techniques:
- These techniques focus on breaking down complex problems into simpler sub-questions, a strategy effective for both humans and GenAI.
- Least-to-Most Prompting: Prompts the LLM to break a problem into sub-problems without solving them, then solves them sequentially, appending responses to the prompt.
- Decomposed Prompting (DECOMP): Uses few-shot examples to show the model how to use specific functions (like string splitting or searching) and then prompts the model to break down its problem into sub-problems for these functions.
Advanced and Integrated Techniques:
- ReAct (Reason & Act): A paradigm that enables LLMs to solve complex tasks by combining natural language reasoning with the use of external tools (like search or a code interpreter), allowing the model to perform actions. It mimics how humans reason and act in the real world.
- Automatic Prompt Engineering (APE): Automates the process of prompt creation by using an LLM to generate, evaluate, and refine prompts itself. This method can enhance model performance and alleviates the need for human input in prompt writing. It involves iterative steps of prompt generation, scoring, and refinement.
- Retrieval Augmented Generation (RAG): This is a system where prompts formulate queries to fetch pertinent information from external sources (like search engines or knowledge graphs), and this retrieved content is integrated into the LLM's workflow to generate more informed responses. RAG-aware prompting techniques exist to leverage these capabilities.
- Techniques for Agents: These involve integrating LLMs into frameworks that use external tools. Specific techniques include Reasoning without Observation (ReWOO), ReAct, and Dialog-Enabled Resolving Agents (DERA).
- Expert Prompting: This technique involves prompting the LLM to embody the persona of experts in relevant fields to simulate expert-level responses. A multi-expert strategy can be used to consider and integrate insights from various expert perspectives.
- Prompt Chains: Consist of two or more prompt templates used in succession, where the output of one template is used to parameterize the next.
- Affordances: Functions defined directly in the prompt that the model is explicitly instructed to use when responding.
- Rails: A strategic approach using structured rules or templates (Canonical Forms) to guide LLM outputs within predefined boundaries, ensuring relevance, safety, and factual integrity.
Application-Specific Prompting:
- Code Prompting: Specific prompt types are used for tasks like writing code, explaining code, translating code between languages, and debugging or reviewing code.
- Multimodal Prompting: Refers to techniques that use multiple input formats, such as combinations of text, images, audio, or code, to guide the LLM, rather than just text.
- Multilingual Prompting: Techniques designed to improve model performance when prompting in languages other than English, especially low-resource languages. Examples include Translate First Prompting, Cross-Lingual Chain-of-Thought (XLT, CLSP), and Multilingual In-Context Learning techniques (X-InSTA, PARC).
Additionally, related concepts like Answer Engineering are part of the overall process, involving the iterative development or selection of algorithms to extract precise answers from potentially variable LLM output formats. Best practices like designing with simplicity, being specific about output, using instructions over constraints, controlling length, using variables, experimenting with formats, adapting to model updates, documenting attempts, and collaborating with others support the effective application of these techniques. Working with domain experts is also crucial to ensure the prompt elicits behavior consistent with desired outcomes. The process is complex and sensitive to specific details, reinforcing the need for experimentation and iterative refinement.