Current Students

Krishna R. Puthucode

MS Student [2021 - 2022]

Computer Science

A Non-mapping Based Similarity Measure for Time Series

[Paper 1 (in press)][Paper 2][Talk]

In this project, we adopt Multiscale Intersection over Union (MIoU), an object-similarity measure and the Dubuc's Variation method which is analogous to the traditional box-counting method for time series, for measuring their similarity. Dubuc's approach measures the scaling behavior of the amplitude of the time series in an 𝜀-neighborhood, with a varying 𝜀-value. We call this new metric TS-Dubuc. MIoU for Time Series (TS-MIoU) measure, on the other hand, employs the box-counting technique to quantify the distance/similarity between time series. Although TS-Dubuc and the variational box-counting method used by TS-MIoU are fundamentally similar, TS-Dubuc improves on TS-MIoU by disregarding the concept of ill-definedness of the space since we do not need to deal with the geometric relationship between the x and y axes. As a result, TS-Dubuc could be interpreted as the number of boxes (𝜀 x 𝜀) whose combined area is equal to that of the 𝜀-neighborhood's envelope, which is defined by its upper bound and lower bound. Contrary to the other similarity measures, including Euclidean, DTW, and DTW’s variants, TS-Dubuc does not require a point-to-point mapping of time series and consequently also avoids issues such as pathological warping. Additionally, its computation is slightly faster when compared to TS-MIoU. In this project, I carried out experiments and demonstrated that TS-Dubuc is more computationally efficient than the TS-MIoU, and some DTW’s variants. My experiments on 100 UCR datasets show that TS-Dubuc can fill the gap between conventional similarity metrics that permit local time shifting and those that do not.

Shreejaa Talla

MS Student [2021 - 2022]

Computer Science

"Augmentation of Solar Filaments for Machine Learning"


A halo Coronal Mass Ejections(CME) can have a disastrous effect on Earth by damaging electrical transmission facilities, satellites, and radio. Filament's magnetic helicity sign can indicate the direction of the magnetic field (and thus, the possibility of a geomagnetic storm) associated with an incoming CME. As a result, we could anticipate a geomagnetic storm.

Identification of filaments' chirality seems to be a well-suited topic in this field, given the massive amount of image data generated by ground-based and space-borne observatories and the exceptional performance of computer vision algorithms in recognizing and classifying objects (events) on images. This task is mainly targeted by Deep Learning algorithms using a Convolutional Neural Network (CNN) backbone. The main challenge is how data-hungry these supervised algorithms are; many model parameters need millions of labeled examples to learn from. It is expensive to create filament datasets with manually determined chirality especially because detecting filaments’ chirality requires domain expertise.. Based on the already-existing and named examples, I developed a pipeline for automatic augmentation of filaments. This Python toolkit offers an infinite supply of enhanced filaments with magnetic helicity indications. By processing labeled filaments through a pipeline of chirality-preserving transformation functions, including Heliographic projection and the flexibility to extend these functions by adding custom transformations, users can augment new filament instances using an existing dataset of H-alpha-based manually labeled filaments as input seeds, collected from August 2000 to 2016 from the big bear solar observatory (BBSO) full-disk solar images. This augmentation engine generates data following users’ requirements and is fully compatible with PyTorch, a well-known deep learning library.

Former Students

Atharv Yeolekar

MS Student [2021 - 2022]

Data Science and Analytics

Feature Selection on a Flare Forecasting Testbed


Solar Energy Particles (SEPs) can be associated with solar flares and coronal mass ejections (CMEs) and offer energy spectra ranging from few KeVs to many GeVs. These events can occur without any notable indication and alter the radiation environment of the inner solar systems, which can potentially lead to precarious conditions for humans in space, affect the interior of spacecraft's sensitive electronics, and trigger radio blackouts. Identifying the most critical physical parameters of the Solar Dynamic Observatory (SDO) to detect SEPs can allow for a swift response against its adverse effects.

With the profusion of high-quality time series data from the SDO, which accounts for the modulating background of magnetic activity and the inherently dynamic phenomenon of pre-flares and post-flare phases; antithetical to non-representative data with the point-in-time measurements employed earlier, selection of vital parameters for solar flare classification using machine learning algorithms appears to be a well-fitted problem in this realm. The primary issue of dealing with multivariate time series data (mvts) is the large number of physical parameters operating at a rapid frequency, making the data dimensionality very high and thus causing the learning process to curb. Moreover, manually selecting vital parameters is a tedious and costly task on which experts may not always agree on the results. In response, we examined feature subset selection using multiple algorithms on both mvts data and the statistical features derived from mvts segments (vectorized data). We used the SWAN-SF (Space Weather Analytics for Solar Flares) benchmark dataset collected from May 2010 - September 2018 to conduct our experiments. The comprehensive study gives a stable scheme to recognize the critical physical parameters, which boosts the learning process and can be used as a blueprint to foretell future solar flare episodes.

Egill Gunnarsson

MS Student [2019 - 2021]

Computer Science

"Interactive Supervised Machine Learning Model Evaluation using D3"

[Defense][Final Product]

The evaluation of a supervised machine learning model is one of the most important aspects of its life cycle. While there are numerous evaluation metrics each of which provides a different insight into models’ performance it can be sometimes challenging to find the appropriate ones that fit the problem in hand. Including the imbalance ratio as an extra variable makes the evaluation process even more difficult. Therefore, I implemented a web application to intuitively evaluate models' performance based on their confusion matrices and given imbalance ratios. This project is an online, interactive application of the Contingency Space recently proposed by Ahmadzadeh et al. (2021). Inspired by this concept, my web application allows the user to visually evaluate their pre-trained supervised models. A side-by-side graphical representation of multiple metrics is provided for a comparison between each metrics score. Confusion matrices are evaluated on metrics such as Accuracy, Precision, F1-Score, Recall, etc. Additionally, the user can load their own customized metrics as well. The visualization is based on contour plots that correlate to each metrics score in relation to True Positive and True Negative rates and imbalance ratios.

This application uses technologies such as d3.js, Python, JavaScript, html, css, flask and json. Each metric’s score is generated in the backend using Python, based on an imbalance ratio. Information is sent to and from the backend via flask and json objects. JavaScript then uses the d3 library to convert the metric scores into a contour plot. The d3 library has many interactive capabilities which allows the user to modify the evaluation to fit every requirement.

Sonam Dawani

MS Student [2019 - 2021]

Data Science and Analytics

"A Texture-Based Approach for Identification of Filaments’ Chirality"


S. Dawani, A. Ahmadzadeh, and R. A. Angryk

43rd COSPAR Scientific Assembly (2021) [+], Machine Learning for Space Science Workshop (ML4SS) [+][+][programme]

Kankana Sinha

MS Student [2017 - 2019]

Computer Science

"MVTS-Data Toolkit: A Python Package for Preprocessing Multivariate Time Series Data"


Azim Ahmadzadeh, Kankana Sinha, Berkay Aydin, and Rafal A. Angryk

SoftwareX [+] Journal (Elsevier), 2020