On August 2, London-based blockchain analytics provider, Elliptic, published what it claims comprises the “world’s largest set of labeled transaction data publicly available on any cryptocurrency,” including approximately 4,075 payment flows identified as being illegal in nature.

The company made the data set publicly available to “motivate and enable the development of new techniques for detection of illicit cryptocurrency transactions.”

2% of Transactions Found to be Illicit

The Elliptic Data Set comprises 203,769 BTC payment flows estimated to have a  combined value of approximately $6 billion. Of the transactions, 2% were found to be illicit in nature, 21% were determined to be legal, and the remaining 77% of transfers were labeled as unknown.

The data set includes labels identifying the transactions that were made by criminal actors, with the firm hoping that making the information public will facilitate “the development and testing of new predictive techniques.”

In a press release, Tom Robinson, the chief scientist and co-founder of Elliptic, stated that the firm used “a range of advanced techniques, including machine learning, to facilitate financial crime detection in cryptocurrencies.”

Elliptic Co-Authors Paper With MIT-IBM Watson AI Lab

The release of the data set coincides with the publication of a paper co-authored by Elliptic alongside researchers from the Massachusetts Institute of Technology-IBM Watson AI Lab.

The paper, ‘Anti-Money Laundering in Bitcoin: Experiments with Graph Convolutional Networks for Financial Forensics’, is set to be presented by IBM research staff at the 23rd annual Knowledge Discovery and Data Mining Conference (KDD) on August 5 in Anchorage Alaska.

Mr. Robinson stated their work alongside the MIT-IBM Watson AI Lab “ensure[s] that our clients have access to the most accurate and effective insights available, reducing their compliance costs and ensuring that their services are not exploited by criminals.”

Mark Weber, the co-author of the paper, stated: 

Graph convolutional networks (GCN) are still a young class of methods, and we're in the early days in these experiments, but we do believe GCN's power to capture the relational information in these large, complex transaction networks could prove valuable for anti-money laundering.