As a result of our projects and research, we often generate or collect datasets that can be useful for research purposes. We always try to release such datasets with an open license and publish them with a data descriptor, so that other researchers can re-use these data. Here you can find some of the datasets generated by the team.
Recent Dataset Products
Botbusters - Analysis of the 2019 Spanish General Election
This dataset presents the data collected from Twitter during the observation period (from October 4th, 2019 to November 11th, 2019), where anonymized tweets and users’ data are included. It was used to analyze the presence and behavior of political social bots on Twitter in the context of the November 2019 Spanish general election. Involved users were classified as social bots or humans, after examining their interactions from a quantitative (amount of traffic generated and existing relations) and qualitative (user’s political affinity and sentiment towards the most important parties) perspectives.
Release Date: March 2020
Spotting Political Social Bots in Twitter: A Use Case of the 2019 Spanish General Election
Javier Pastor-Galindo, Mattia Zago, Pantaleone Nespoli, Sergio López Bernal, Alberto Huertas Celdrán, Manuel Gil Pérez, José A. Ruipérez-Valiente, Gregorio Martínez Pérez, Félix Gómez Mármol.
BEHACOM - A Dataset Modelling Users' Behaviour in Computers
This dataset showcases the behaviour of twelve users interacting with their computers for fifty-five consecutive days, without pre-established indications or restrictions. The BEHACOM dataset contains for each user a set of features that models, in one-minute time windows, the usage of computer resources such as CPU or memory, as well as the activities registered by applications, by following a privacy-preserving approach to protect the collected data.
Release Date: April 2020
Pedro M. Sánchez Sánchez, José María Jorquera Valero, Mattia Zago, Alberto Huertas Celdrán, Lorenzo Fernández Maimó, Eduardo López Bernal, Sergio López Bernal, Javier Martínez Valverde, Pantaleone Nespoli, Javier Pastor-Galindo, Ángel Luis Perales Gómez, Manuel Gil Pérez, Gregorio Martínez Pérez
UMUDGA - University of Murcia Domain Generation Algorithm Dataset
This dataset showcases a collection of over 30 million manually labeled algorithmically generated domain names, decorated with a feature set ready-to-use for machine learning (ML) analysis. Among a selected number of 50 malware families, each of them is available as a list of domains, generated by executing malware domain generation algorithms (DGAs) in a controlled environment with fixed parameters, as well as a collection of features being generated through the extraction of a combination of statistical and natural language processing metrics.
Release Date: February 2020
ReCAN - Dataset for Reverse Engineering of Controller Area Networks
This dataset details data obtained from the Controller Area Network (CAN) buses in two personal vehicles and three commercial trucks for a total of 36 million data frames. It is composed of two complementary parts, namely the raw data extracted from the vehicles and the decoded data obtained from the actual sensors’ data. Motivated enough actors may intercept, interact, and recognize vehicle data with consumer-grade technology, ultimately refuting, once-again, the security-through-obscurity paradigm used by the automotive manufacturer as a primary defensive countermeasure.
Release Date: January 2020
If you are interested to collaborate or know more about our R&D experience in the cybersecurity and data science fields, please contact us.