Data & Software

Products: Data & Software

Below, I compile a list of products that my team have created (or contributed to; identified by '*').

None of these products would have been made without the substantial efforts of graduate and undergraduate students.

DATASET

(2024)

MAGFiLO: Manually Annotated GONG Filaments in H-Alpha Observations

This dataset represents the largest collection of manually annotated filaments from H-Alpha observations captured by the Global Oscillation Network Group (GONG). It contains 10,244 annotated filaments from 1,593 observations spanning the years 2011 through 2022. This is the result of 1,066 person-hours of annotation.

This ML-ready dataset is publicly accessible for the community and is ready to be used for training complex machine learning models to enhance our understanding of solar filaments.

"A Dataset of Manually Annotated Filaments from H-Alpha Observations" [Original]

[DOI: 10.1038/s41597-024-03876-y]

Azim Ahmadzadeh, Rohan Adhyapak, Kartik Chaurasiya, Laxmi Alekhya Nagubandi, V. Aparna, Petrus C. Martens, Alexei Pevtsov, Luca Bertello, Alexander Pevtsov, Naomi Douglas, Samuel McDonald, Apaar Bawa, Eugene Kang, Riley Wu, Dustin J. Kempton, Aya Abdelkarem, Patrick M. Copeland, and Sri Harsha Seelamneni

Scientific Data, Nature

SOFTWARE

(2024)

H-Alpha Anomalizer: A web application for facilitating exploration of GONG H-Alpha Observations

Users can search for an observation using their time stamps or file names, and see the next and previous observations through a user-friendly setting. For each observation, the link to its corresponding FITS file and Header file are available as well.

WEB APP

(2023)

GONG H-Alpha Viewer: A web application for exploring of GONG H-Alpha Observations

SOFTWARE

(2021)

Contingency Space Visualization Tool

This project is an interactive, supervised classification model evaluation web application. It is based on the research paper "Contingency Space: A Semimetric Space for Classification Verification", by A. Ahmadzadeh, D. J. Kempton, P. C. Martens and R. A. Angryk.

It provides a solution to the limitations of a single-value metric evaluation; limitations such as 1) Lack of Context, 2) One-Dimensional View, 3) Unintuitive, 4) Incomparable, 5) Binary Restrictions and 6) Not Customizable. This project allows the user to evaluate a single confusion matrix or a batch of confusion matrices in contingency spaces. These contingency spaces are based on metrics and an imbalance ratios. Twelve popular metrics are provided, such as Accuracy, Precision, Recall, F1-Score etc. but the user can also upload a customized metric to use for the evaluation.

"Contingency Space: A Semimetric Space for Classification Evaluation" [Original]

[DOI: 10.1109/TPAMI.2022.3167007]

A. Ahmadzadeh, D. J. Kempton, P. C. Martens, and R. A. Angryk

IEEE Transactions on Pattern Analysis and Machine Intelligence

SOFTWARE

(2020)

Multivariate Time Series Data Toolkit (MVTS-Data Toolkit)

We developed a domain-independent Python package to facilitate the preprocessing routines required in preparation of any multi-class, multivariate time series data. It provides a comprehensive set of 48 statistical features for extracting the important characteristics of time series. The feature extraction process is automated in a sequential and parallel fashion, and is supplemented with an extensive summary report about the data. Using other modules, different data normalization methods and imputations are at users' disposal. To cater the class-imbalance issue, that is often intrinsic to real-world datasets, a set of generic but user-friendly, sampling methods are also developed.

MVTS-Data Toolkit: A Python Package for Preprocessing Multivariate Time Series Data [pdf]

> [DOI: 10.1016/j.softx.2020.100518]

Azim Ahmadzadeh, Kankana Sinha, Berkay Aydin, and Rafal A. Angryk

SoftwareX [+] Journal (Elsevier), 2020

DATASET*

(2020)

Space Weather Data Analytics for Solar Flares (SWAN-SF)

We introduce and make openly accessible a comprehensive, multivariate time series (MVTS) dataset extracted from solar photospheric vector magnetograms in Spaceweather HMI Active Region Patch (SHARP) series. Our dataset also includes a cross-checked NOAA solar flare catalog that immediately facilitates solar flare prediction efforts. We discuss methods used for data collection, cleaning and preprocessing of the solar active region and flare data, and we further describe a novel data integration and sampling methodology. Our dataset covers 4,098 MVTS data collections from active regions occurring between May 2010 and December 2018, includes 51 flare-predictive parameters, and integrates over 10,000 flare reports. Potential directions toward expansion of the time series, either “horizontally” – by adding more prediction-specific parameters, or “vertically” – by generalizing flare into integrated solar eruption prediction, are also explained. The immediate tasks enabled by the disseminated dataset include: optimization of solar flare prediction and detailed investigation for elusive flare predictors or precursors, with both operational (research-to-operations), and basic research (operations-to-research) benefits potentially following in the future.

Multivariate Time Series Dataset for Space Weather Data Analytics [pdf][post][data]

> [DOI: 10.1038/s41597-020-0548-x]

Angryk, R.A., Martens, P.C., Aydin, B., Kempton, D., Mahajan, S.S., Basodi, S., Ahmadzadeh, A., Cai, X., Boubrahimi, S.F., Hamdi, S.M., Schuh, M.A. and Georgoulis, M.K.

Scientific Data, Nature

DATASET

(2019)

A Curated Image Parameter Dataset from Solar Dynamics Observatory Mission

We provide a large image parameter data set extracted from the Solar Dynamics Observatory (SDO) mission's Atmospheric Imaging Assembly (AIA) instrument, for the period of 2011 January through the current date, with the cadence of 6 minutes, for nine wavelength channels. The volume of the data set for each year is just short of 1 TiB. Toward achieving better results in the region classification of active regions and coronal holes, we improve on the performance of a set of 10 image parameters, through an in-depth evaluation of various assumptions that are necessary for calculation of these image parameters. Then, where possible, a method for finding an appropriate setting for the parameter calculations was devised, as well as a validation task to show our improved results. In addition, we include comparisons of JP2 and FITS image formats using supervised classification models, by tuning the parameters specific to the format of the images from which they are extracted and specific to each wavelength. The results of these comparisons show that utilizing JP2 images, which are significantly smaller files, is not detrimental to the region classification task that these parameters were originally intended for. Finally, we compute the tuned parameters on the AIA images and provide a public API to access the data set. This data set can be used in a range of studies on AIA images, such as content-based image retrieval or tracking of solar events, where dimensionality reduction on the images is necessary for feasibility of the tasks.

Curated Image Parameter Dataset from Solar Dynamics Observatory Mission [article][api]

> [DOI: 10.3847/1538-4365/ab253a]

Azim Ahmadzadeh, Dustin J. Kempton, and Rafal A. Angryk

The Astrophysical Journal Supplement Series

* Indicates products that I have NOT lead but contributed to.

Google Sites

Report abuse