Structured Output
In LangChain, structured output refers to the practice of having language models return responses in a well-defined data format (for example, JSON), rather than free-form text. This makes the model output easier to parse and work with programmatically
Example:
[prompt]: Can you create a Plan for a trip to Paris including places to visit, food to try, and duration of stay in days?
[LLM Unstructured Output]:
Sure! Here's a sample plan for one day in Paris:
At Morning: Visit the Louvre Museum and explore its vast art collection.
At Afternoon: Visit the Eiffel Tower and enjoy the view from the top.
At Evening: Try some delicious French cuisine at a local bistro.
[JSON Enforced Structured Output]:
```json
[
{
"time_of_day": "Morning",
"activity": "Visit the Louvre Museum and explore its vast art collection."
},
{
"time_of_day": "Afternoon",
"activity": "Visit the Eiffel Tower and enjoy the view from the top."
},
{
"time_of_day": "Evening",
"activity": "Try some delicious French cuisine at a local bistro."
}
]
Need of Structured Output
- Data Extraction: Structured output makes it easier to extract specific pieces of information from the model’s response.
- API Building: When building APIs that interact with LLMs, structured output allows for consistent and predictable responses.
- Agents:When building agents that perform actions based on LLM outputs, structured data helps in decision-making processes.
Ways to achieve Structured Output
There are two types of models:
- Model that can directly output structured data
- Model that cannot directly output structured data
Model that can directly output structured data
This can be done using with_structured_output function.But we need to define the output schema using Pydantic models.
Basic of Pydantic model:
from pydantic import BaseModel
from typing import Optional,Literal
from datetime import date
#Baisc Pydantic Model
class TripPlanItem(BaseModel):
time_of_day: str
activity: str
#Setting Optional Fields
class TripPlanItemOptional(BaseModel):
time_of_day: str
activity: str
duration_in_hours: Optional[int] = None
# It means if duration_in_hours is not provided, it will be set to None by default.
# Available Data Types
class TripPlanItemDataTypes(BaseModel):
time_of_day: str
activity: str
duration_in_hours: Optional[int] = None
is_outdoor: Optional[bool] = None
cost_estimate: Optional[float] = None
# Date Field
class TripPlanItemWithDate(BaseModel):
time_of_day: str
activity: str
visit_date: date
# List of strings
class TripPlanItemWithTags(BaseModel):
time_of_day: str
activity: str
tags: list[str]
# Nested Model
class Location(BaseModel):
name: str
address: str
class TripPlanItemNested(BaseModel):
time_of_day: str
activity: str
location: Location
# Nested List Model
class TripPlanItemNestedList(BaseModel):
time_of_day: str
activity: str
locations: list[Location]
#Adding Choices to Field
class TripPlanItemWithChoices(BaseModel):
time_of_day: str
activity: str
transportation_mode: Literal['car', 'bike', 'walk', 'public_transport']
# Using Field for additional validation
from pydantic import Field
class TripPlanItemWithField(BaseModel):
time_of_day: str = Field(description="Time of day for the activity",default="Morning")
activity: str = Field(min_length=5, max_length=100, description="Description of the activity",default="Visit the museum")
attendees: int = Field(gt=0, description="Number of attendees must be greater than 0", default=1)
Using with_structured_output
from pydantic import BaseModel,Field
from langchain_ollama.chat_models import ChatOllama
class Trips(BaseModel):
time_of_day: str = Field("Time of day for the activity")
activity: str = Field(min_length=5, max_length=100, description="Description of the activity")
class TripPlanItem(BaseModel):
trips: list[Trips] = Field(description="List of trip plan items")
chat_model = ChatOllama(model="llama3")
structured_model= chat_model.with_structured_output(TripPlanItem)
response= structured_model.invoke("I want a 1 day plan to spend in paris")
print(response)
Model that cannot directly output structured data
It can be achieved by using Output Parsers in LangChain. There are different types of output parsers available in LangChain but the most commonly used are
- StringOutputParser : This parser simply returns the output as a string without any additional processing. It is useful when you want to capture the raw text output from the model.
- PydanticOutputParser : This parser uses Pydantic models to validate and parse the output from the model. It ensures that the output conforms to the defined schema and raises errors if the output does not match the expected format.
Note: Make sure we can use Output parsers to both Model that can directly output structured data and Model that cannot directly output structured data
StringOutputParser
Consider we need to create a summery for thhat we need to do like this
topic -> LLM -> Detailed Explanation -> LLM -> Summery
For that we use like this: Before using StringOutputParser
from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
llm = ChatOllama(model="llama3")
template1 = PromptTemplate(template="""
You are a helpful assistant that answers questions about the world.
Question: {question}
Make sure to answer in detail as much as possible.
""",
input_variables=["question"]
)
prompt1 = template1.invoke({"question": "Explain about the capital of Nepal?"})
response1 = llm.invoke(prompt1)
template2 = PromptTemplate(template="""
Summerize this within 5 sentences.
{text}
""",
input_variables=["text"]
)
prompt2 = template2.invoke({"text":response1.content})
response2 = llm.invoke(prompt2)
print(response2.content)
After using StringOutputParser
from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOllama(model="llama3")
template1 = PromptTemplate(template="""
You are a helpful assistant that answers questions about the world.
Question: {question}
Make sure to answer in detail as much as possible.
""",
input_variables=["question"]
)
template2 = PromptTemplate(template="""
Summerize this within 5 sentences.
{text}
""",
input_variables=["text"]
)
parser = StrOutputParser()
chain = template1 | llm | parser | template2 | llm | parser
res=chain.invoke({"question": "Explain about the capital of Nepal?"})
print(res)
PydanticOutputParser
from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel
class Explaination(BaseModel):
answer: str
sources: list[str]
is_popular: bool
parser = PydanticOutputParser(pydantic_object = Explaination)
chat_model = ChatOllama(model="llama3")
template = PromptTemplate(template="""Tell me about country {question}\n {format_instructions}""",
input_variables=["question"],
partial_variables= {"format_instructions": parser.get_format_instructions()}
)
prompt = template.invoke({"question": "Nepal"})
response = chat_model.invoke(prompt)
final_result = parser.parse(response.content)
print(final_result)
Using chain
from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel
class Explaination(BaseModel):
answer: str
sources: list[str]
is_popular: bool
parser = PydanticOutputParser(pydantic_object = Explaination)
chat_model = ChatOllama(model="llama3")
template = PromptTemplate(template="""Tell me about country {question}\n {format_instructions}""",
input_variables=["question"],
partial_variables= {"format_instructions": parser.get_format_instructions()}
)
chain = template | chat_model | parser
print(chain.invoke({"question": "Nepal"}))