Currently working as a data scientist at CyCraft, specializing in computational mathematics, algorithms, and NLP. Graduated from the Department of Applied Mathematics at NYCU and has been a speaker at CYBERSEC and CraftCon.
Large Language Models (LLMs) have shown great potential in cybersecurity applications. However, to fully harness their value, inherent biases and stability issues in LLM-driven security assessments must be effectively addressed. This talk will focus on these challenges and present our latest research on improving evaluation frameworks.
Our study analyzes how LLMs can be influenced by the order of presented options during the assessment process, leading to biases. We propose ranking strategies and probabilistic weighting techniques that significantly improve scoring accuracy and consistency. Key topics covered in this talk include experimental design and observations on LLM biases, probability-based weighting adjustments, and methodologies for integrating results from multiple ranking permutations. Notably, through validation with the G-EVAL dataset, we demonstrate measurable improvements in model evaluation performance.
Whether you are conducting research on language models or working in cybersecurity technology and decision-making, this talk will provide valuable technical insights and practical takeaways.
CYBERSEC 2025 uses cookies to provide you with the best user experience possible. By continuing to use this site, you agree to the terms in our Privacy Policy 。