While ChatGPT falls short of the accuracy required for integration, its potential keeps Coinbase intrigued.

In a blog post published on Monday (March 20), Tom Ryan, a blockchain security engineer at Coinbase, revealed the company’s experiment using OpenAI’s ChatGPT to conduct automated token security reviews. Despite not meeting the required accuracy for integration into their asset review process, the experiment demonstrated enough potential for further investigation and possible future use.

Coinbase’s Blockchain Security team is responsible for reviewing token contracts and deciding on their listing on the centralized exchange. In addition, they ensure that tokens meet the necessary security criteria, working with project teams to mitigate any identified risks. To enhance the efficiency of their review process, the team tested ChatGPT, which has shown promise in optimizing code, identifying vulnerabilities, and completing other tasks based on the given prompts.

The experiment evaluated ChatGPT’s accuracy in producing token security risk scores by comparing its results with a standard review performed by a blockchain security engineer. Using prompt engineering, the team provided ChatGPT with their ERC20 security review framework and a smart contract to analyze.

Out of 20 smart contract risk scores compared between ChatGPT and manual security reviews, ChatGPT produced the same result 12 times. However, it incorrectly labeled five high-risk assets as low-risk, which is a significant concern since underestimating a risk score is more detrimental than overestimating.

Ryan says that ChatGPT’s limitations include its inability to recognize when it lacks context for robust security analysis, leading to coverage gaps. Additionally, it can be inconsistent, sometimes providing different answers to the same question, and may be influenced by comments in the code. Furthermore, OpenAI’s ongoing iterations of ChatGPT result in output instability, requiring prompt maintenance and output quality control to ensure consistency and avoid operational failures.

While ChatGPT’s efficiency is impressive, its accuracy needs to be improved for integration into Coinbase’s security review process. However, with further prompt engineering, the tool’s accuracy could improve. If that occurs, ChatGPT could be used as a secondary quality assurance check, providing an additional layer of control to catch overlooked risks.

