By David Omurwa in ChatGPT — 23 Feb 2025

Using AI in Vulnerability Analysis - Part 1 (Testing the Limits of ChatGPT Plus)

I decided to create a custom ChatGPT that would enrich vulnerability scan data with information on which vulnerability is OS or Non-OS related

Introduction

IT maintenance processes are nuanced in every environment and this presents the need for Vulnerability Analysts to extract additional insights from their already detailed vulnerability scan reports. In this article, I aim to explore the potential of AI in vulnerability analysis, starting with a simple task, splitting OS and Non-OS Related Vulnerabilities.

Prompt: Generate an image of a security team in a SOC and do it in a comic style. Make it monochromatic.

Problem

More often than not, the process of remediating OS vulnerabilities is separated from the process of remediating non-OS vulnerabilities. Take this example, you have a machine running Windows Server, hosting a custom, business critical, Java-based application. If a vulnerability scan reveals that the server has Eternal Blue (CVE-2017-0144) and Log4Shell (CVE-2021-44228), both the team responsible for the Java Application and the IT Maintenance team would have to be notified, therefore, the need for splitting OS related vulnerabilities into non-OS related vulnerabilities arises.

Solution

I decided to create a custom ChatGPT that would enrich vulnerability scan data with information on which vulnerability is OS or Non-OS related. I have provided the instructional prompt at the end of the article. Coming up with a prompt that would guarantee consistent results was a real pain and it took me a few hours of testing and fine-tuning to come up with the final prompt however, that's a story for another day. Let's see what ChatGPT could do.

Generating Test Data

I wanted to emulate the vulnerability scan results of a small enterprise. So I used the following prompt to generate test data.

I would like you to generate a mock vulnerability report. This report should have 50 servers 20 of them running Windows and 30 of them running Linux. These servers host databases, web servers and other services. This report should have at least 500 vulnerabilities randomly divided amongst the 50 servers. The vulnerabilities must be existing and must align with the server's purpose. The output should be an excel file.

The output of this prompt was a report with 500 existing vulnerabilities distributed over several devices that serve different purposes (as requested in the prompt). The diversity of the vulnerabilities was not great and they were not really aligned with the purpose of the servers but for me this was a minor drawback, additionally, I was curious to see how it would affect the final analysis.

Included vulnerabilities

Feeding the Data to Chat GPT

Now this is where the fun begins. While running the CustomGPT, I often ran into hallucinations and could not run the workflow consistently without additional intervention. More importantly, I could not get the model to provide accurate information.

The first issue I ran into was that ChatGPT would incorrectly count the vulnerabilities, leading to incomplete results where only the first 5 vulnerabilities in the list were enriched. Though I had explicitly asked ChatGPT to count and enrich all the vulnerabilities, this issue came back over and over again.

ChatGPT was too lazy to do the analysis leading to an incorrect analysis. This happened several times, even when I provided an existing list of 50 vulnerabilities - a subset of the 500 vulnerabilities.

In this case a dummy analysis was provided

ChatGPT being lazy

I decided to move away from the custom GPTs which run on ChatGPT 4-o and tried running the prompt using the o1 model, for some reason, it could not see the file that I uploaded.

No matter how many times I tried ChatGPT's o-1 could not see the file

Interim Conclusion

I suspect that the reason behind the task not running on the custom GPT is that it is too intensive. When it comes to the o1 model, it seems like the issue here is a bug which I have reported to OpenAI through the feedback function in the chat. Given the results, I've decided to get my hands on other AI models and see how they handle this task.

In the meantime, I would love to hear your feedback on this test feel free to contact me via LinkedIn or drop me an email via david.omurwa@cleonlabs.com.

Main Instructional Prompt

Role Definition:
I want you to act as my Assistant Security Analyst, with your main responsibility being to help me analyze vulnerability scan results. Your tasks will involve categorizing vulnerabilities into OS-related and Non-OS-related vulnerabilities by leveraging trusted security sources. For example if the vulnerability affects Microsoft Windows it will be an OS-Related Vulnerability, when it affects a Java Application installed on a Windows Server, it will be an Non-OS related vulnerability*

Instructions on How to Respond to Specific Conversation Starters:

Conversation Starter: "I would like to sort vulnerabilities into OS and non-OS related vulnerabilities."

*Once you receive this request, you will follow the structured workflow outlined below while keeping track of each step. You must follow the exact sequence and clearly label each step in your responses.
Workflow for Sorting Vulnerabilities:

Step 1: Request the List of Vulnerabilities
Prompt: "Step 1: Please provide the list of vulnerabilities. This may be in various formats, such as CSV, Excel, or plain text. If the vulnerabilities are in a structured format, ensure they include CVE-IDs for accurate analysis."

Step 2: Load and Count the Vulnerabilities
Once I provide the list, you must process it while ensuring you only focus on CVE-IDs.
If the vulnerabilities are in CSV or Excel, use pandas to extract the relevant column containing CVE-IDs.
Do not assume or infer the number of vulnerabilities from a sample.
Use len(df) to get the exact count of rows in the dataset.
Display only the count, no sample rows.
If the file is large, process it in chunks but always return the full count.
Respond with:
"Step 2: The number of vulnerabilities is [Count]."

Step 3: Lookup Vulnerabilities on Trusted Security Sources
Manually checking is NOT acceptable.
You will automatically retrieve information about each CVE-ID using NIST NVD (https://nvd.nist.gov).
If the CVE is not found on NIST NVD, check alternative reputable sources, such as:
a. Vendor security advisories (Microsoft, Apple, RedHat, etc.)
b. Cybersecurity firms (Rapid7, Tenable, etc.)
c. Government security databases (US-CERT, CISA, etc.)
DO NOT truncate or leave any vulnerabilities unclassified.
You must not present a table yet in this step.
Response format:
"Step 3: Researching vulnerabilities and determining OS or Non-OS classification. Please wait..."

Step 4: Ask if the Results Should be Appended to an Existing File
Before providing the final results, ask:
"Step 4: Would you like to append the results to the existing table? (yes/no)"
If "yes":
Append the “Vulnerability Type” and “Verdict Source” columns to the provided file based on their CVE-IDs.
If the original file was CSV or Excel, use pandas to modify it.
Ensure ALL vulnerabilities are classified and saved correctly.
Provide the updated file for download.
If "no", proceed to Step 5.
Do not proceed to Step 5 unless the user explicitly responds to Step 4. If the user does not respond to Step 4, repeat the question until an answer is given.

Step 5: Provide Final Categorization in Table Format
If the user chose not to append results to a file, present the categorized vulnerabilities in a structured table format.
Table should have three columns:
CVE-ID - The vulnerability identifier (e.g., CVE-2023-21535).
Vulnerability Type - Either "OS-related" or "Non-OS related".
Verdict Source - The source of information in a clickable format [Source](Source URL).
Respond with:
"Step 5: [Output] Below is the categorized list of vulnerabilities:"
Provide the table without truncation.*