Open Data at CIRCL

Open Data at CIRCL

CIRCL advocates data sharing and knows that sharing Open Data can lead to new research, analyses, software or services that could improve security on the long-term. We also hope that the data shared can be used for any usage including commercial or non-commercial security services within Luxembourg and abroad.

Open Data Definition

“Open data and content can be freely used, modified, and shared by anyone for any purpose” as defined by opendefinition.org.

Open Data Available at CIRCL

CVE daily JSON dump

A daily JSON dump of all the CVE (Common Vulnerabilities and Exposures) is published with the expanded values as seen on https://cve.circl.lu/. The file is a gzip compressed JSON file (>190MB):

BGP Ranking data dump per AS number

BGP Ranking is a public service developed and operated by CIRCL starting from 2012 until today with the ranking of malicious activities seen per BGP AS number (e.g. ISP, Hosting companies). The historical data can be queried using the following format:

curl -X POST -d '{"asn": "5577", "date": 2019-11-11}' https://bgpranking-ng.circl.lu/json/asn

Note: it is possible to query historical information for the last ~12 months.

And if you want to see the data on the web interface: https://bgpranking-ng.circl.lu/asn?asn=5577.

Allaple malware infection statistics (raw data)

Allaple worm is a malware family still infecting multiple systems on the Internet. The statistics collected from our honeypot are available for the year 2015 at the following location:

Classification datasets

More information about each dataset is available on their dedicated page.

circl-phishing-dataset-01 is a dataset of 400+ pictures of verified or potential phishing websites screenshots.

circl-ail-dataset-01 is a dataset of 37000+ pictures of dark-web’s websites screenshots. The final dataset will be composed of 37000+ images, the dataset is enlarged as it is classified, with part of 4000 pictures each.

Papers about Image Matching

We proposed in preprint some papers relative to Image Matching. One for the datasets, one for their classification tool, one about a framework to evaluate literature algorithms and one about the developed library which bundles relevant algorithms.

CIRCL operational statistics

The operational statistics cover the activities related to the incident response activities of CIRCL especially in regards to the reporting (e.g. incident reports, request for analysis or support during computer security incident) and notifications (e.g. take-down notification, notification about vulnerability) from/to third parties.

Academic researchers who used our Open Data

  • Ralph Holz , Johanna Amann , Olivier Mehani , Matthias Wachs , Mohamed Ali Kaafar. “TLS in the wild: an Internet-wide analysis of TLS-based protocols for electronic communication”. download
  • Konte, Maria, Roberto Perdisci, and Nick Feamster. “ASwatch: An AS Reputation System to Expose Bulletproof Hosting ASes.” Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. ACM, 2015. download
  • Wagner, Christoph, et al. “ASMATRA: Ranking ASs providing transit service to malware hosters.” Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on. IEEE, 2013. download
  • Ghafir, Ibrahim, and Václav Přenosil. “ADVANCED PERSISTENT THREAT AND SPEAR PHISHING EMAILS.” DISTANCE LEARNING, SIMULATION AND COMMUNICATION 2015 (2015): 34. download
  • Ghafir, Ibrahim, and Vaclav Prenosil. “Malicious File Hash Detection and Drive-by Download Attacks.” Proceedings of the Second International Conference on Computer and Communication Technologies. Springer India, 2016. download
  • Vykopal, Jan. Flow-based Brute-force Attack Detection in Large and High-speed Networks. Diss. PhD thesis, Masaryk University, Brno, Czech Republic, 2013. download

If you used our data for your research, feel free to contact us if you want to be listed.

Classification of this document

TLP:WHITE information may be distributed without restrictions. The document and the Open Data mentioned are licensed under an international CC-BY 4.0.

Revision

  • Version 1.3 October 12th, 2017 Operational Statistics of CIRCL added
  • Version 1.2 March 15th, 2016 New academics papers added TLP:WHITE.
  • Version 1.1 December 10th, 2015 CVE JSON dump added TLP:WHITE.
  • Version 1.0 October 27th, 2015 Initial version TLP:WHITE.