Skip to main content
Outlines is a Python library for constrained language generation. It provides a unified interface to various language models and allows for structured generation using techniques like regex matching, type constraints, JSON schemas, and context-free grammars.
Outlines supports multiple backends, including:
  • Hugging Face Transformers
  • llama.cpp
  • vLLM
  • MLX
This integration allows you to use Outlines models with LangChain, providing both LLM and chat model interfaces.

Installation and Setup

To use Outlines with LangChain, youโ€™ll need to install the Outlines library:
pip install outlines
Depending on the backend you choose, you may need to install additional dependencies:
  • For Transformers: pip install transformers torch datasets
  • For llama.cpp: pip install llama-cpp-python
  • For vLLM: pip install vllm
  • For MLX: pip install mlx

LLM

To use Outlines as an LLM in LangChain, you can use the Outlines class:
from langchain_community.llms import Outlines

Chat Models

To use Outlines as a chat model in LangChain, you can use the ChatOutlines class:
from langchain_community.chat_models import ChatOutlines

Model Configuration

Both Outlines and ChatOutlines classes share similar configuration options:
model = Outlines(
    model="meta-llama/Llama-2-7b-chat-hf",  # Model identifier
    backend="transformers",  # Backend to use (transformers, llamacpp, vllm, or mlxlm)
    max_tokens=256,  # Maximum number of tokens to generate
    stop=["\n"],  # Optional list of stop strings
    streaming=True,  # Whether to stream the output
    # Additional parameters for structured generation:
    regex=None,
    type_constraints=None,
    json_schema=None,
    grammar=None,
    # Additional model parameters:
    model_kwargs={"temperature": 0.7}
)

Model Identifier

The model parameter can be:
  • A Hugging Face model name (e.g., โ€œmeta-llama/Llama-2-7b-chat-hfโ€)
  • A local path to a model
  • For GGUF models, the format is โ€œrepo_id/file_nameโ€ (e.g., โ€œTheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.ggufโ€)

Backend Options

The backend parameter specifies which backend to use:
  • "transformers": For Hugging Face Transformers models (default)
  • "llamacpp": For GGUF models using llama.cpp
  • "transformers_vision": For vision-language models (e.g., LLaVA)
  • "vllm": For models using the vLLM library
  • "mlxlm": For models using the MLX framework

Structured Generation

Outlines provides several methods for structured generation:
  1. Regex Matching:
    model = Outlines(
        model="meta-llama/Llama-2-7b-chat-hf",
        regex=r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
    )
    
    This will ensure the generated text matches the specified regex pattern (in this case, a valid IP address).
  2. Type Constraints:
    model = Outlines(
        model="meta-llama/Llama-2-7b-chat-hf",
        type_constraints=int
    )
    
    This restricts the output to valid Python types (int, float, bool, datetime.date, datetime.time, datetime.datetime).
  3. JSON Schema:
    from pydantic import BaseModel
    
    class Person(BaseModel):
        name: str
        age: int
    
    model = Outlines(
        model="meta-llama/Llama-2-7b-chat-hf",
        json_schema=Person
    )
    
    This ensures the generated output adheres to the specified JSON schema or Pydantic model.
  4. Context-Free Grammar:
    model = Outlines(
        model="meta-llama/Llama-2-7b-chat-hf",
        grammar="""
            ?start: expression
            ?expression: term (("+" | "-") term)*
            ?term: factor (("*" | "/") factor)*
            ?factor: NUMBER | "-" factor | "(" expression ")"
            %import common.NUMBER
        """
    )
    
    This generates text that adheres to the specified context-free grammar in EBNF format.

Usage Examples

LLM Example

from langchain_community.llms import Outlines

llm = Outlines(model="meta-llama/Llama-2-7b-chat-hf", max_tokens=100)
result = llm.invoke("Tell me a short story about a robot.")
print(result)

Chat Model Example

from langchain_community.chat_models import ChatOutlines
from langchain.messages import HumanMessage, SystemMessage

chat = ChatOutlines(model="meta-llama/Llama-2-7b-chat-hf", max_tokens=100)
messages = [
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="What's the capital of France?")
]
result = chat.invoke(messages)
print(result.content)

Streaming Example

from langchain_community.chat_models import ChatOutlines
from langchain.messages import HumanMessage

chat = ChatOutlines(model="meta-llama/Llama-2-7b-chat-hf", streaming=True)
for chunk in chat.stream("Tell me a joke about programming."):
    print(chunk.content, end="", flush=True)
print()

Structured Output Example

from langchain_community.llms import Outlines
from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    rating: int
    summary: str

llm = Outlines(
    model="meta-llama/Llama-2-7b-chat-hf",
    json_schema=MovieReview
)
result = llm.invoke("Write a short review for the movie 'Inception'.")
print(result)

Additional Features

Tokenizer Access

You can access the underlying tokenizer for the model:
tokenizer = llm.tokenizer
encoded = tokenizer.encode("Hello, world!")
decoded = tokenizer.decode(encoded)

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.