During the past few years, I have created, or collaborated with others to create, a few products to facilitate the interdisciplinary research of Machine Learning and Solar Physics. A brief description of those products and the links to their archives are cataloged below.
A Curated Image Parameter Dataset from Solar Dynamics Observatory Mission (2019)
We provide a large image parameter data set extracted from the Solar Dynamics Observatory (SDO) mission's Atmospheric Imaging Assembly (AIA) instrument, for the period of 2011 January through the current date, with the cadence of 6 minutes, for nine wavelength channels. The volume of the data set for each year is just short of 1 TiB. Toward achieving better results in the region classification of active regions and coronal holes, we improve on the performance of a set of 10 image parameters, through an in-depth evaluation of various assumptions that are necessary for calculation of these image parameters. Then, where possible, a method for finding an appropriate setting for the parameter calculations was devised, as well as a validation task to show our improved results. In addition, we include comparisons of JP2 and FITS image formats using supervised classification models, by tuning the parameters specific to the format of the images from which they are extracted and specific to each wavelength. The results of these comparisons show that utilizing JP2 images, which are significantly smaller files, is not detrimental to the region classification task that these parameters were originally intended for. Finally, we compute the tuned parameters on the AIA images and provide a public API to access the data set. This data set can be used in a range of studies on AIA images, such as content-based image retrieval or tracking of solar events, where dimensionality reduction on the images is necessary for feasibility of the tasks.
Space Weather Data Analytics for Solar Flares (SWAN-SF) (2020)
We introduce and make openly accessible a comprehensive, multivariate time series (MVTS) dataset extracted from solar photospheric vector magnetograms in Spaceweather HMI Active Region Patch (SHARP) series. Our dataset also includes a cross-checked NOAA solar flare catalog that immediately facilitates solar flare prediction efforts. We discuss methods used for data collection, cleaning and preprocessing of the solar active region and flare data, and we further describe a novel data integration and sampling methodology. Our dataset covers 4,098 MVTS data collections from active regions occurring between May 2010 and December 2018, includes 51 flare-predictive parameters, and integrates over 10,000 flare reports. Potential directions toward expansion of the time series, either “horizontally” – by adding more prediction-specific parameters, or “vertically” – by generalizing flare into integrated solar eruption prediction, are also explained. The immediate tasks enabled by the disseminated dataset include: optimization of solar flare prediction and detailed investigation for elusive flare predictors or precursors, with both operational (research-to-operations), and basic research (operations-to-research) benefits potentially following in the future.
Angryk, R.A., Martens, P.C., Aydin, B., Kempton, D., Mahajan, S.S., Basodi, S., Ahmadzadeh, A., Cai, X., Boubrahimi, S.F., Hamdi, S.M., Schuh, M.A. and Georgoulis, M.K.Scientific Data, Nature
Multivariate Time Series Data Toolkit (MVTS-Data Toolkit) (2020)
We developed a domain-independent Python package to facilitate the preprocessing routines required in preparation of any multi-class, multivariate time series data. It provides a comprehensive set of 48 statistical features for extracting the important characteristics of time series. The feature extraction process is automated in a sequential and parallel fashion, and is supplemented with an extensive summary report about the data. Using other modules, different data normalization methods and imputations are at users' disposal. To cater the class-imbalance issue, that is often intrinsic to real-world datasets, a set of generic but user-friendly, sampling methods are also developed.
Contingency Space Visualization Tool (2021)
This project is an interactive, supervised classification model evaluation web application. It is based on the research paper "Contingency Space: A Semimetric Space for Classification Verification", by A. Ahmadzadeh, D. J. Kempton, P. C. Martens and R. A. Angryk.
It provides a solution to the limitations of a single-value metric evaluation; limitations such as 1) Lack of Context, 2) One-Dimensional View, 3) Unintuitive, 4) Incomparable, 5) Binary Restrictions and 6) Not Customizable. This project allows the user to evaluate a single confusion matrix or a batch of confusion matrices in contingency spaces. These contingency spaces are based on metrics and an imbalance ratios. Twelve popular metrics are provided, such as Accuracy, Precision, Recall, F1-Score etc. but the user can also upload a customized metric to use for the evaluation.