ดึงข้อมูลจาก AI ให้เป็นระเบียบด้วย Structured Output และ Pydantic

โดยปกติแล้ว AI จะส่งผลลัพธ์กลับมาเป็น free text — ข้อความธรรมดาที่มนุษย์อ่านเข้าใจ แต่โปรแกรมจัดการต่อได้ยาก

แต่ในงานจริง เราต้องการมากกว่านั้น เช่น บันทึกข้อมูลลง database, ส่งต่อให้ระบบอื่น, หรือ process ผลลัพธ์แบบอัตโนมัติ — สิ่งเหล่านี้ต้องการ Structured Data ที่มี format ชัดเจน ไม่ใช่ข้อความลอยๆ ลองนึกภาพ use case พวกนี้ดู:

📧

Email → Calendar

อ่าน email แล้วดึงนัดหมาย → บันทึกลง calendar อัตโนมัติ

🧾

Slip → รายจ่าย

อ่าน slip โอนเงิน → จำนวน, วันที่, ผู้รับ → บันทึกบัญชี

🛒

Review → Product

ดึงข้อมูลสินค้าจาก review → ชื่อ, ราคา, rating, ข้อดี/ข้อเสีย

📋

Bug Report → Ticket

ดึงข้อมูลจาก bug report → severity, module, steps to reproduce

📊

Meeting → Action Items

แปลง meeting notes → action items + owner + deadline

💼

Resume → คัดกรอง

วิเคราะห์ resume → skills, ประสบการณ์ → คัดกรองอัตโนมัติ

📰

ข่าว → Database

ดึงข้อมูลข่าว → หัวข้อ, วันที่, หมวดหมู่, sentiment

🔍

Log → Alert

อ่าน log file → error code, timestamp, affected service

🏥

ใบสั่งยา → แจ้งเตือน

อ่านใบสั่งยา → ชื่อยา, dosage, ความถี่ → ระบบแจ้งเตือน

บทความนี้จะอธิบายว่าเราบังคับให้ AI ส่งผลลัพธ์เป็น structured data ได้อย่างไร ผ่านสองเครื่องมือหลัก — Structured Output และ Pydantic — โดยใช้ตัวอย่างจริงจากการดึงข้อมูลประกาศหางาน

ปัญหาที่เกิดขึ้นจริงเมื่อ AI ตอบกลับแบบ free-form

สมมุติเราต้องการดึงข้อมูลจากประกาศหางานนี้ แล้วเก็บลง database:

ตัวอย่าง: ประกาศหางาน (raw text)
“””
บริษัท TechHive จำกัด (มหาชน) รับสมัคร Python Developer ด่วน!
ประสบการณ์ 2-4 ปี เงินเดือน 45,000 – 65,000 บาท
สถานที่ทำงาน: กรุงเทพฯ (ไฮบริด)
Skills ที่ต้องการ: Python, FastAPI, PostgreSQL, Docker
ติดต่อ: hr@techhive.co.th ภายใน 30 เมษายน 2025
“””

วิธีที่คนส่วนใหญ่เริ่มทำคือใส่คำสั่งตรงๆ ลงใน prompt เลย แบบนี้:

Naive Prompt (วิธีที่ดูเหมือนจะได้ผล…)
job_text = “””บริษัท TechHive จำกัด (มหาชน) รับสมัคร Python Developer…”””prompt = f”””
ดึงข้อมูลจากประกาศหางานต่อไปนี้ แล้วตอบกลับเป็น JSON เท่านั้น
ห้ามมีข้อความอื่น ห้ามมี markdownประกาศหางาน:
{job_text}
“””response = client.messages.create(
model=“claude-haiku-4-5”,
max_tokens=512,
messages=[{“role”: “user”, “content”: prompt}]
# ← ไม่มี tools, ไม่มี schema — แค่ขอให้ตอบเป็น JSON
)raw = response.content[0].text  # เอา text มาตรงๆ
data = json.loads(raw)          # หวังว่ามันจะ parse ได้…

ผลที่ได้บางครั้งก็ดูดีนะ — แต่ปัญหาคือมันไม่ คงที่ ลองรันหลายๆ ครั้ง หรือเปลี่ยน input นิดหน่อย ผลอาจออกมาแตกต่างกันได้:

LLM output (ที่อาจพัง)
# บางครั้ง LLM ตอบกลับพร้อม markdown fence 👇
“`json
{
“job_title”: “Python Developer”,
“salary”: “45,000 – 65,000”,   ← string หรือ number?
“skills”: “Python, FastAPI, PostgreSQL, Docker”,  ← ไม่ใช่ list!
“experience”: “2-4 ปี”         ← min/max หายไปเลย
}
“`

⚠️ ปัญหาหลัก: LLM ตอบถูกบ้าง ผิดบ้าง — ขึ้นอยู่กับ prompt และ context ในแต่ละครั้ง โค้ดเราเลยไม่สามารถเชื่อถือ structure ของ output ได้ 100%

Structured Output คืออะไร?

Structured Output คือเทคนิคที่บังคับให้ LLM ส่งผลลัพธ์ออกมาในรูปแบบที่กำหนดไว้ล่วงหน้า (เช่น JSON ที่มี fields ตามที่เราต้องการ) แทนที่จะเป็นข้อความอิสระ

🎯 เป้าหมาย

ได้ output ที่โปรแกรมอ่านได้โดยตรง ไม่ต้อง parse เองอีกต่อไป

⚙️ วิธีการ

บอก LLM ว่า “เราต้องการ schema แบบนี้” ผ่าน tool_use หรือ JSON prompt

🧩 Use Case

ดึงข้อมูลจากเอกสาร, จำแนกประเภท, สร้าง pipeline อัตโนมัติ

🔗 เชื่อมกับ Agents

ทุก tool call ใน AI Agent คือ structured output รูปแบบหนึ่ง

Pydantic คืออะไร และทำไมถึงต้องใช้?

Pydantic คือ Python library ที่ช่วย validate และ parse ข้อมูล โดยใช้ type hints เป็นตัวกำหนด schema ลองดูตัวอย่างที่ไม่ใช้ Pydantic vs ใช้ Pydantic:

❌ ไม่มี Pydantic (อันตราย)
# เราไม่รู้เลยว่า data ที่ได้มานั้น valid หรือเปล่า
data = json.loads(llm_response)# salary เป็น string? int? ไม่รู้ จนกว่าจะ crash
total = data[“salary_min”] + data[“salary_max”]  ← TypeError!# skills เป็น list หรือ string? ไม่รู้เลยfor skill in data[“skills”]:  ← อาจ loop ตัวอักษรแทน
save_skill(skill)

✅ มี Pydantic (ปลอดภัย)
from pydantic importBaseModel, Fieldfrom typing import List, OptionalclassJobPosting(BaseModel):
job_title:   str
company:     str
salary_min:  int = Field(…, ge=0, description=“เงินเดือนขั้นต่ำ (บาท)”)
salary_max:  int = Field(…, ge=0, description=“เงินเดือนสูงสุด (บาท)”)
location:    str
work_mode:   str# “onsite” | “hybrid” | “remote”
skills:      List[str]  # ← ต้องเป็น list เสมอ
exp_min:     int
exp_max:     int
deadline:    Optional[str] = None# Pydantic จะ validate ให้เลย ถ้า type ผิด → raise ValidationError ทันที
job = JobPosting.model_validate_json(llm_response)# ใช้ได้เลย ไม่ต้องกลัว type error
budget_range = job.salary_max – job.salary_min  ← int – int = int ✓for skill in job.skills:   ← list[str] เสมอ ✓
save_skill(skill)

ตัวอย่างเต็ม: Job Posting Extractor

มาดูการทำงานจริงแบบ step-by-step ตั้งแต่ต้นจนจบ ใช้เทคนิค tool_use ซึ่งเป็นวิธีที่ดีที่สุดในการบังคับให้ LLM ส่ง structured output:

Step 1 — กำหนด Schema ด้วย Pydantic

job_extractor.py — Part 1: Schema
from anthropic importAnthropicfrom pydantic importBaseModel, Fieldfrom typing import List, Optional
import jsonclassJobPosting(BaseModel):
“””Schema สำหรับข้อมูลประกาศหางาน”””
job_title:   str        = Field(…, description=“ชื่อตำแหน่งงาน”)
company:     str        = Field(…, description=“ชื่อบริษัท”)
salary_min:  int        = Field(…, ge=0, description=“เงินเดือนขั้นต่ำ (บาท)”)
salary_max:  int        = Field(…, ge=0, description=“เงินเดือนสูงสุด (บาท)”)
location:    str        = Field(…, description=“จังหวัดหรือพื้นที่”)
work_mode:   str        = Field(…, description=“onsite / hybrid / remote”)
skills:      List[str]   = Field(…, description=“list ของ skills ที่ต้องการ”)
exp_min:     int        = Field(…, ge=0, description=“ประสบการณ์ขั้นต่ำ (ปี)”)
exp_max:     int        = Field(…, ge=0, description=“ประสบการณ์สูงสุด (ปี)”)
deadline:    Optional[str] = Field(None, description=“วันปิดรับสมัคร”)

Step 2 — แปลง Pydantic เป็น Tool Schema สำหรับ Anthropic API

job_extractor.py — Part 2: Tool definition
# ดึง JSON Schema จาก Pydantic class โดยตรง → ส่งให้ Anthropic APIextract_job_tool = {
“name”: “extract_job_posting”,
“description”: “ดึงข้อมูลโครงสร้างจากประกาศหางาน”,
“input_schema”: JobPosting.model_json_schema()
#                ↑ Pydantic สร้าง JSON Schema ให้อัตโนมัติ 🎉
}

💡 เคล็ดลับ: model_json_schema() ของ Pydantic จะสร้าง JSON Schema ที่ถูกต้องให้เลย ไม่ต้องเขียน schema เองด้วยมืออีกต่อไป — รวมถึง descriptions, types, และ constraints ทั้งหมด

Step 3 — เรียก API และ Validate ผลลัพธ์

job_extractor.py — Part 3: Extract & Validate

client = Anthropic()raw_job_posting = “””
บริษัท TechHive จำกัด (มหาชน) รับสมัคร Python Developer ด่วน!
ประสบการณ์ 2-4 ปี เงินเดือน 45,000 – 65,000 บาท
สถานที่ทำงาน: กรุงเทพฯ (ไฮบริด)
Skills ที่ต้องการ: Python, FastAPI, PostgreSQL, Docker
ติดต่อ: hr@techhive.co.th ภายใน 30 เมษายน 2025
“””response = client.messages.create(
model=“claude-haiku-4-5”,
max_tokens=1024,
tools=[extract_job_tool],
tool_choice={“type”: “tool”, “name”: “extract_job_posting”},
#  ↑ บังคับให้ LLM ต้องเรียก tool นี้เสมอ (ไม่มีทางหนี)
messages=[{
“role”: “user”,
“content”: raw_job_posting
}]
)# ดึง block ที่เป็น tool_use
tool_block = next(b for b in response.content if b.type == “tool_use”)# Validate ด้วย Pydantic — ถ้า type ผิดจะ raise ValidationError ทันทีtry:
job = JobPosting(**tool_block.input)
print(f”✅ ตำแหน่ง  : {job.job_title}”)
print(f”✅ บริษัท   : {job.company}”)
print(f”✅ เงินเดือน: {job.salary_min:,} – {job.salary_max:,} บาท”)
print(f”✅ Skills   : {‘, ‘.join(job.skills)}”)
print(f”✅ ประสบการณ์: {job.exp_min}-{job.exp_max} ปี”)
exceptValidationErroras e:
print(f”❌ Data ไม่ valid: {e}”)

ผลลัพธ์ที่ได้

Output

✅ ตำแหน่ง  : Python Developer
✅ บริษัท   : TechHive จำกัด (มหาชน)
✅ เงินเดือน: 45,000 – 65,000 บาท
✅ Skills   : Python, FastAPI, PostgreSQL, Docker
✅ ประสบการณ์: 2-4 ปี

ภาพรวม: ข้อมูลไหลยังไง?

ประกาศงาน (raw text) → Anthropic API + tool_use

↓

tool_block.input (dict) → JobPosting(**input)

↓

✅ Validated Python Object → บันทึก DB / ส่ง API ต่อ

เปรียบเทียบ 3 วิธี

วิธี	ตัวอย่าง	ข้อดี	ข้อเสีย
Naive JSON prompt	“ตอบเป็น JSON เท่านั้น”	เขียนง่าย	ไม่มี validation, อาจมี markdown fence
Pydantic + json.loads	parse แล้ว validate ด้วย model	มี type safety	LLM ยังอาจส่ง format ผิดบ้าง
tool_use + Pydantic	บังคับผ่าน API	Guaranteed structure + validation	verbose กว่าเล็กน้อย

Pydantic v2 Tricks ที่ควรรู้

Pydantic v2 Quick Reference
# 1. สร้าง JSON Schema → ส่งให้ LLM เป็น hintJobPosting.model_json_schema()# 2. Parse + Validate จาก JSON string
job = JobPosting.model_validate_json(json_string)# 3. Serialize กลับเป็น JSON
job.model_dump_json(indent=2)# 4. แปลงเป็น dict (ก่อนบันทึก DB)
job.model_dump()# 5. Constraint ต่างๆ
salary: int = Field(…, ge=0, le=500_000)   # 0 ≤ salary ≤ 500,000
email:  str = Field(…, pattern=r”^[\w.]+@[\w.]+$”)  # regex validation
tags:   List[str] = Field(default_factory=list)    # safe default

🎯 สรุปแก่น

Structured Output = บังคับให้ LLM ส่งข้อมูลใน format ที่เรากำหนด ไม่ใช่ free-form text
Pydantic = เป็น “สัญญา” ระหว่าง LLM output กับ Python code — รับรองว่า type ถูก, field ครบ
tool_use = วิธีที่ดีที่สุด เพราะ LLM ถูก train มาให้ส่ง valid JSON เมื่อเรียก tool
model_json_schema() = ให้ Pydantic สร้าง schema ให้เอง ไม่ต้องเขียนมือ
Pattern นี้คือ รากฐานของ AI Agent — ทุก tool call คือ structured output ที่ validated แล้ว

ปัญหาที่เกิดขึ้นจริงเมื่อ AI ตอบกลับแบบ free-form

Structured Output คืออะไร?

🎯 เป้าหมาย

⚙️ วิธีการ

🧩 Use Case

🔗 เชื่อมกับ Agents

Pydantic คืออะไร และทำไมถึงต้องใช้?

ตัวอย่างเต็ม: Job Posting Extractor

Step 1 — กำหนด Schema ด้วย Pydantic

Step 2 — แปลง Pydantic เป็น Tool Schema สำหรับ Anthropic API

Step 3 — เรียก API และ Validate ผลลัพธ์

ผลลัพธ์ที่ได้

ภาพรวม: ข้อมูลไหลยังไง?

เปรียบเทียบ 3 วิธี

Pydantic v2 Tricks ที่ควรรู้

🎯 สรุปแก่น

Comments

Leave a Reply Cancel reply