In today's digital environment, organizations often fail to detect in real-time when their data is leaked and sold online. Our goal is to shorten the time gap between the exposure of data on the internet and its detection by the public, thereby minimizing the duration in which sensitive corporate data remains exposed. The dark web serves as a primary marketplace for trading personal information and can be accessed securely through browsers like Tor browser. This paper focuses on web crawling of dark web sites. Utilizing data collected from these sites, we trained a BERT classification model to categorize transaction posts into five different types of data breaches. This enables rapid identification of the type of leak each post pertains to. Finally, we employ a Retrieval-Augmented Generation (RAG) approach to gain insights from the dark web.
TOPIC / TRACK
AI Security & Safety Forum
LOCATION
Taipei Nangang Exhibition Center, Hall 2
7F 703
LEVEL
Intermediate Intermediate sessions focus on
cybersecurity
architecture, tools, and practical applications, ideal for
professionals with a basic understanding of
cybersecurity.
SESSION TYPE
Breakout Session
LANGUAGE
Chinese
SUBTOPIC
Data Leak
Incident Response
AI
CYBERSEC 2025 uses cookies to provide you with the best user experience possible. By continuing to use this site, you agree to the terms in our Privacy Policy 。