4/17 (Thu.) 14:45 - 15:15 1F 1B

The Evolution of LLM Evaluation: Implementing and Challenges of LLM-as-a-Judge

With the rapid iteration of Large Language Models (LLM) reasoning models and AI Agents, LLMs have been becoming critical technology components driving efficiency and innovation across industries. However, the complexity of the use cases and AI risks pose significant challenges for organizations adopting LLM technologies.

This sharing will explore the challenges of LLM risk evaluation and introduce the LLM-as-a-Judge framework—an innovative approach that leverages LLMs to evaluate, identify, and further mitigate risks of LLM systems. The speaker will provide an in-depth analysis of LLM-as-a-Judge’s architecture and key success factors, offering insights into how organizations can enhance AI system's security and trustworthiness through advanced LLM evaluation methodologies. This session aims to establish a solid foundation for organizations in AI risk management, ensuring safe, reliable and trustworthy AI system deployments.

Stanley Chou
SPEAKER
Coupang
Director of Security Engineering

TOPIC / TRACK
AI Security & Safety Forum
Live Translation Session

LOCATION
Taipei Nangang Exhibition Center, Hall 2
1F 1B

LEVEL
Intermediate Intermediate sessions focus on cybersecurity architecture, tools, and practical applications, ideal for professionals with a basic understanding of cybersecurity.

SESSION TYPE
Breakout Session

LANGUAGE
Chinese
Real-Time Chinese & English Translation

SUBTOPIC
LLM
AI Safety
Responsible AI