Structured Output

In LangChain, structured output refers to the practice of having language models return responses in a well-defined data format (for example, JSON), rather than free-form text. This makes the model output easier to parse and work with programmatically

Example:

[prompt]: Can you create a Plan for a trip to Paris including places to visit, food to try, and duration of stay in days?

[LLM Unstructured Output]:
Sure! Here's a sample plan for one day in Paris:
At Morning: Visit the Louvre Museum and explore its vast art collection.
At Afternoon: Visit the Eiffel Tower and enjoy the view from the top.
At Evening: Try some delicious French cuisine at a local bistro.


[JSON Enforced Structured Output]:
```json
[
    {
        "time_of_day": "Morning",
        "activity": "Visit the Louvre Museum and explore its vast art collection."
    },
    {
        "time_of_day": "Afternoon",
        "activity": "Visit the Eiffel Tower and enjoy the view from the top."
    },
    {
        "time_of_day": "Evening",
        "activity": "Try some delicious French cuisine at a local bistro."
    }
]

Need of Structured Output

Data Extraction: Structured output makes it easier to extract specific pieces of information from the model’s response.
API Building: When building APIs that interact with LLMs, structured output allows for consistent and predictable responses.
Agents:When building agents that perform actions based on LLM outputs, structured data helps in decision-making processes.

Ways to achieve Structured Output

There are two types of models:

Model that can directly output structured data
Model that cannot directly output structured data

Model that can directly output structured data

This can be done using with_structured_output function.But we need to define the output schema using Pydantic models.

Basic of Pydantic model:

from pydantic import BaseModel
from  typing import Optional,Literal
from datetime import date

#Baisc Pydantic Model
class TripPlanItem(BaseModel):
    time_of_day: str
    activity: str

#Setting Optional Fields
class TripPlanItemOptional(BaseModel):
    time_of_day: str
    activity: str
    duration_in_hours: Optional[int] = None
    # It means if duration_in_hours is not provided, it will be set to None by default.

# Available Data Types
class TripPlanItemDataTypes(BaseModel):
    time_of_day: str
    activity: str
    duration_in_hours: Optional[int] = None
    is_outdoor: Optional[bool] = None
    cost_estimate: Optional[float] = None

# Date Field
class TripPlanItemWithDate(BaseModel):
    time_of_day: str
    activity: str
    visit_date: date

# List of strings
class TripPlanItemWithTags(BaseModel):
    time_of_day: str
    activity: str
    tags: list[str]

# Nested Model
class Location(BaseModel):
    name: str
    address: str
class TripPlanItemNested(BaseModel):
    time_of_day: str
    activity: str
    location: Location


# Nested List Model
class TripPlanItemNestedList(BaseModel):
    time_of_day: str
    activity: str
    locations: list[Location]


#Adding Choices to Field
class TripPlanItemWithChoices(BaseModel):
    time_of_day: str
    activity: str
    transportation_mode: Literal['car', 'bike', 'walk', 'public_transport']

# Using Field for additional validation
from pydantic import Field
class TripPlanItemWithField(BaseModel):
    time_of_day: str = Field(description="Time of day for the activity",default="Morning")
    activity: str = Field(min_length=5, max_length=100, description="Description of the activity",default="Visit the museum")
    attendees: int = Field(gt=0, description="Number of attendees must be greater than 0", default=1)

Using with_structured_output

from pydantic import BaseModel,Field
from langchain_ollama.chat_models import ChatOllama
class Trips(BaseModel):
    time_of_day: str = Field("Time of day for the activity")
    activity: str = Field(min_length=5, max_length=100, description="Description of the activity")
class TripPlanItem(BaseModel):
    trips: list[Trips] = Field(description="List of trip plan items")
chat_model = ChatOllama(model="llama3")
structured_model= chat_model.with_structured_output(TripPlanItem)
response= structured_model.invoke("I want a 1 day plan to spend in paris")
print(response)

Model that cannot directly output structured data

It can be achieved by using Output Parsers in LangChain. There are different types of output parsers available in LangChain but the most commonly used are

StringOutputParser : This parser simply returns the output as a string without any additional processing. It is useful when you want to capture the raw text output from the model.
PydanticOutputParser : This parser uses Pydantic models to validate and parse the output from the model. It ensures that the output conforms to the defined schema and raises errors if the output does not match the expected format.

Note: Make sure we can use Output parsers to both Model that can directly output structured data and Model that cannot directly output structured data

StringOutputParser

Consider we need to create a summery for thhat we need to do like this

topic -> LLM -> Detailed Explanation -> LLM -> Summery

For that we use like this: Before using StringOutputParser

from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
llm = ChatOllama(model="llama3")

template1 = PromptTemplate(template="""
You are a helpful assistant that answers questions about the world.
Question: {question}
Make sure to answer in detail as much as possible.
""",
input_variables=["question"]
)

prompt1 = template1.invoke({"question": "Explain about the capital of Nepal?"})

response1 = llm.invoke(prompt1)

template2 = PromptTemplate(template="""
Summerize this within 5 sentences.
{text}
""",
input_variables=["text"]
)

prompt2 = template2.invoke({"text":response1.content})

response2 = llm.invoke(prompt2)

print(response2.content)

After using StringOutputParser

from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOllama(model="llama3")

template1 = PromptTemplate(template="""
You are a helpful assistant that answers questions about the world.
Question: {question}
Make sure to answer in detail as much as possible.
""",
input_variables=["question"]
)

template2 = PromptTemplate(template="""
Summerize this within 5 sentences.
{text}
""",
input_variables=["text"]
)
parser = StrOutputParser()
chain = template1 | llm | parser | template2 | llm | parser
res=chain.invoke({"question": "Explain about the capital of Nepal?"})
print(res)

PydanticOutputParser

from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel
class Explaination(BaseModel):
    answer: str
    sources: list[str]
    is_popular: bool

parser = PydanticOutputParser(pydantic_object = Explaination)
chat_model = ChatOllama(model="llama3")
template = PromptTemplate(template="""Tell me about country  {question}\n {format_instructions}""",
input_variables=["question"],
partial_variables= {"format_instructions": parser.get_format_instructions()}
)
prompt = template.invoke({"question": "Nepal"})
response = chat_model.invoke(prompt)
final_result = parser.parse(response.content)
print(final_result)

Using chain

from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel
class Explaination(BaseModel):
    answer: str
    sources: list[str]
    is_popular: bool
parser = PydanticOutputParser(pydantic_object = Explaination)
chat_model = ChatOllama(model="llama3")
template = PromptTemplate(template="""Tell me about country  {question}\n {format_instructions}""",
input_variables=["question"],
partial_variables= {"format_instructions": parser.get_format_instructions()}
)
chain = template | chat_model | parser
print(chain.invoke({"question": "Nepal"}))

Structured Output#

Ways to achieve Structured Output#

Model that can directly output structured data#

Basic of Pydantic model:#

Using with_structured_output#

Model that cannot directly output structured data#

StringOutputParser#

PydanticOutputParser#