On FITS

compared to the more popular JP2 format


The following is the report of my investigation on FITS files, conversion of them using Java libraries, and a comparison with JPEG2000 format.

AIA

HV (JP2)

JSOC (FITS)

JSOC (JP2)

FITS Lib

javadoc

FITS Intro

FITS doc

Keys

App: DS9

Other Liibs

Access data





Introduction

While maintaining the data in such a high precision level during this translation process could be vital for some analytical studies of the solar activities, it is not trivial that for any other task, such precision is beneficial. Note that processing a reasonably large amount of such images (e.g. 6-year worth of SDO images) could be computationally overwhelming and keep TBs of CPUs busy for a couple of months. For this very reason, beside the typical down-sizing steps (dimensionality reduction) during the preprocessing of the data, it is crucial to learn whether or not working on JP2 images as the raw input would make any significant difference. Of course, the answer to this question depends on the way the data is going to be extracted form the images.

To have a better understanding regarding the difficulty of dealing with large size images, note that the typical size of a 4096 X 4096 px AIA FITS image varies from 7M to 14M depending on the waveband the image were taken in. The size of a similar image in JP2 format, on the other hand, is about 1M across all different wavebands. One of the typical experiments that we would need to run on these images, is to extract some image parameters with different settings and use other domain (solar) data to decide which are the best settings for a particular task. In our last study, there were 60 different settings to be experimented. Since we have 10 different wavebands, we need to extract each image 600 times. A reliable experiment typically takes up at least one month worth of data. Depending on the particular event we are looking for, this could result in thousands of images ( ~4000 in that case). This means processing 2400000 images, which is equal to 24TB of data (for the FITS file of size on average 10MB ) instead of 2.4TB for JP2 images.

What FITS is?

In 1981, FITS format was introduced for the interchange of astronomical images (and other digital arrays) on magnetic tape as a solution to the data transport problem. FITS is a data format for recoding digital images on seven-track and nine-track magnetic tape.

"The AIA cameras use a 14-bit analog-to-digital converter (ADC) to translate the charge read out from each pixel to a digital number [DN] value" [1]. So, the maximum pixel value for AIA images is 2^17 = 16384. This value may be observed on the extremely bright regions of the SDO images, mainly during the X-class flares. Since the values in the FITS file correspond to the number of photons hit the CCDs of the SDO camera, they cannot be negative, except after processing and using interpolation. Those negative values should be cleaned out (replaced with zeros).

AIA_calibration.pdf
FITS-a flexible image transport system.pdf

Quick read and analysis of FITS files:

Working with FITS files as text could be a real hassle, because of its size. It took me a while to find out that there is a very handy app that gives you much more than the content of the header(s).

SAOImage DS9, is a free application, initially developed by Mike Van Hilst, in 1990 and then continued by others. Using this application, the header(s), several different visualization of the image, histogram, etc can be quickly provided for the user.

sudo apt-get install saods9

Reading the image of a FITS file

Below, there are two snippets of Java code for reading FITS files, using nom.tam FITS library (version: 1.15.2).

  • For uncompressed FITS files:
Fits f = new Fits(DIR_TO_FITS)
f.read();
ImageHDU hdu = (ImageHDU) f.getHDU(1);
int[][] im = (int[][]) hdu.getKernel();
  • For compressed FITS files:
Fits f = new Fits(DIR_TO_FITS)
f.readHDU();
CompressedImageHDU chdu = (CompressedImageHDU) f.readHDU();
ImageHDU uncompressedImage = this.chdu.asImageHDU();
ImageData imageData = uncompressedImage.getData();
short[][] im = (short[][]) imageData.getData();

Reading cards from the header

Each FITS file has one or more headers that end with the keyword END. Below you can see a few lines of the header of an AIA image:

BITPIX  =                    8 / 8-bit bytes
NAXIS   =                    2 / 2-dimensional binary table
NAXIS1  =                    8 / width of table in bytes
NAXIS2  =                 4096 / number of rows in table
PCOUNT  =             11473606 / size of special data area
GCOUNT  =                    1 / one data group (required keyword)

Each of these lines is called a card. A card has a key, followed by a value, and a description. The following snippet of Java code show how they can be accessed:

headerData = new ImageDBFitsHeaderData();
Header header = hdu.getHeader();
double naxis1 = header.getDoubleValue("NAXIS1");
> NAXIS1: 8

Among the many different pieces of information fits header provides, there are DATAMIN and DATAMAX. In the documentation of fits file it is written that these are the valid minimum and maximum values. As I investigated, these values are the actual minimum and maximum values in the array of the pixel values, which are compared on the right.

Some results -->

As it is shown, the maximum possible value for each cell of the image is 16383 which is the largest 2-byte integer where for each byte, one bit is reserved for the sign. So for 2 bytes, 2*(8-1) bits are used, and this means the largest integer would be 2 ^ 14 = 16384.

On the other hand, for the minimum values, negative numbers are observed while the minimum value must be zero. These negative values must be treated as noise and are generated in the process of interpolation in the raw-to-level1 post processing step.

> aia.lev1_euv_12s.2017-09-06T084207Z.304.image_lev1.fits
  actualMax:2528.0    validMax: 2528.0
  actualMin:-7.0      validMin: -7.0
> aia.lev1_euv_12s.2017-09-06T085407Z.304.image_lev1.fits
  actualMax:2934.0    validMax: 2934.0
  actualMin:-13.0     validMin: -13.0
> aia.lev1_euv_12s.2017-09-06T093010Z.171.image_lev1.fits
  actualMax:16383.0   validMax: 16383.0
  actualMin:-6.0      validMin: -6.0
> aia.lev1_euv_12s.2017-09-06T093001Z.94.image_lev1.fits
  actualMax:15732.0   validMax: 15732.0
  actualMin:-8.0      validMin: -8.0
> aia.lev1_euv_12s.2017-09-06T084201Z.94.image_lev1.fits
  actualMax:2304.0    validMax: 2304.0
  actualMin:-7.0      validMin: -7.0
> aia.lev1_euv_12s.2017-09-06T084208Z.131.image_lev1.fits
  actualMax:5706.0    validMax: 5706.0
  actualMin:-205.0    validMin: -205.0
> aia.lev1_euv_12s.2017-09-06T091801Z.94.image_lev1.fits
  actualMax:16383.0   validMax: 16383.0
  actualMin:-10.0     validMin: -10.0
> aia.lev1_euv_12s.2017-09-06T091759Z.211.image_lev1.fits
  actualMax:16383.0   validMax: 16383.0
  actualMin:-17.0     validMin: -17.0
> aia.lev1_euv_12s.2017-09-06T085408Z.131.image_lev1.fits
  actualMax:4386.0    validMax: 4386.0
  actualMin:-208.0    validMin: -208.0
> aia.lev1_euv_12s.2017-09-06T090608Z.304.image_lev1.fits
  actualMax:15749.0   validMax: 15749.0
  actualMin:-13.0     validMin: -13.0
> aia.lev1_euv_12s.2017-09-06T091811Z.131.image_lev1.fits
  actualMax:7228.0    validMax: 7228.0
  actualMin:-227.0    validMin: -227.0
> aia.lev1_euv_12s.2017-09-06T093002Z.335.image_lev1.fits
  actualMax:3262.0    validMax: 3262.0
  actualMin:-163.0    validMin: -163.0
> aia.lev1_euv_12s.2017-09-06T090610Z.131.image_lev1.fits
  actualMax:14939.0   validMax: 14939.0
  actualMin:-227.0    validMin: -227.0

Output Test

[aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits]








To ensure that the reader code works properly, I created the heatmap of the FITS file I queried from JSOC, and compared it with the same snapshot that I downloaded from SDO repository.

SDO image in JP2 format

Heatmap of fits format

A 3D visual comparison

In these visualizations, the following AIA image was queried in two formats, FITS (from JSOC) and JP2 (from SDO):

aia.lev1_euv_12s.2010-09-13T173013Z.171

This particular image is chosen because of the presence of an X class flare at this time period. Both of the images are rescaled to 1024 pixels (from 4096X4096) to reduce the size of the 3D plot. No normalization is applied to their values.

  • In the original FITS file, the maximum value was 16383 that due to rescaling of the image size (re-sampling the values) it changed to 9443.
  • In the second visualization, the JP2 format is plotted, so the range is [0, 255] .
  • For the third plot, a clipping method is used by setting the new min and max to [0, 44] and (for this wavelength;171), replacing each cell with its squared, and eventually cutting of all values that are still out side of the new range. This range is borrowed from the Helioviewer's setting.
  • 3D visualization of the FITS format

(Size: ~80MB - Be patient.)


  • 3D visualization of the JP2 format

(Size: ~80MB - Be patient.)


  • 3D visualization after clipping the extreme values.

(Size: ~80MB - Be patient.)




Distribution of Values

Below, it is shown how skewed the distribution of color intensities is in FITS format.

X: Color intensity values

Y: Frequency on FITS file



[TOP]

MAX: 9443

95-th percentile: 665

[BOTTOM]

MAX: 14412

95-th percentile: 476

aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits
aia.lev1_euv_12s.2017-09-06T84210Z.171.image_lev1.fits

Max and Min intensity values of fits files, across wavelengths

While observing different AIA images (in FITS format), I noticed that the range of the values is not consistent across different wavelengths. For instance, while in 171 Angstroms, values as large as the maximum possible value (2^14) (i.e., 16383) can be seen at the center of the flares, in 94 angstroms, the range is mostly limited to 2-digit numbers. This raises two questions:

  • What is the MIN and MAX values of each wavelength, across a long period of time?

On the right, the results of my experiments are shown. As one can see, regardless of the wavelength, 0 and 16838 are recorded. In other words, it seems to be true if we say the maximum possible values will be recorded, at least in the presence of X-class flares, regardless of the (EUV) wavelength.

But is this a rare observation that took place because of the X-class flare? (See next question.)

  • How often the MIN and MAX values are recorded in a certain period of time?

[] Global MIN and MAX values


DATA: Using JSOC front-end, I requested AIA FITS files, starting from 2010.09.01, for a period of 30 days (/30d), for every 2 hours (/@2h).

QUERY: aia.lev1_euv_12s[2010.09.01_13:30/30d@2h][? WAVELNTH = 94 ?]{image}

the global min and max values are listed below:

94  A -->    MIN:[0.0]   MAX:[16383.0]
131 A -->    MIN:[0.0]   MAX:[16383.0]
171 A -->    MIN:[0.0]   MAX:[16383.0]
193 A -->    MIN:[0.0]   MAX:[16383.0]
211 A -->    MIN:[0.0]   MAX:[16383.0]
304 A -->    MIN:[0.0]   MAX:[16383.0]

It is interesting to study the changed of several different percentiles of pixel intensities for a period of one month, as illustrated on the right. Note that in each of these plots, the intensities as extreme as 16383 are observed, however, these values are not event caught in the 99.5-th percentile of the distributions.












The spike in the first plot corresponds to: aia.lev1_euv_12s.2010-09-30T153007Z.94.image_lev1.fits (see) where several bright regions are recorded which are relatively very strong compared to the snapshots 2 hours before and after.

Percentiles of pixel intensity values of full-image

Percentiles of pixel intensity values of Sun's disk ONLY

Max p-th percentile of pixel values for full image

The table on the right, is the summary of the above plots. For each of the nine wavelength (94, 131, 171, 193, 211, 304, 335, 1600, 1700 Å), the maximum values of the percentiles are shown.

Ex. For wavelength of 131 Ångstroms, 99% of the intensity values are below 88.

  • It is important to note that for 99.5-th percentile, we are evaluating our data after cutting 0.5% of the points, while we should keep in mind that this small piece is still comprised of 83,886 pixel values.

Max p-th percentile of pixel values within the Sun's disk

This table shows the percentile of pixel values, where all pixels outside the disk of Sun are ignored.

The location of disk's center, as well as the radius is obtained from each file's header.

       80.th    90.th    95.th    99.th    99.5.th  100.th
W: 94  7        10       15       34       44       16383
W:131  19       30       43       88       123      16383 
W:171  568      777      1034     1935     2602     16383
W:193  574      904      1354     2884     3968     16383 
W:211  154      258      429      1159     1673     16383
W:304  116      151      188      327      431      16383 
W:335  16       26       43       171      305      16383 
W:1600 196      242      289      427      509      16046
W:1700 1801     2205     2558     3517     4138     16215
       80.th    90.th    95.th    99.th    99.5.th  100.th
W: 94  8        11       15       36       54       16383 
W:131  25       36       49       103      150      14945 
W:171  690      892      1113     2163     2773     16383 
W:193  666      1011     1542     3342     4459     16383 
W:211  188      312      557      1413     1919     16383
W:304  153      190      234      441      599      16383 
W:335  20       31       55       265      417      14865
W:1600 247      294      348      521      629      16013
W:1700 2253     2604     2978     4236     5159     16215

Several animated plots showing the distribution of colors in FITS files

Below, distribution of FITS's color intensity for values higher than the 99.5-th percentile is shown. This distribution is the average obtained by observing one month worth of data (~354 fits file).

Note: The vertical dotted line is draws on the 99.5-th percentile, for each wavelength.

Note: The Y-axis is limited to the range [0,50] for a better visualization.

Summary of distributions of above the 99.5-th percentile values

Image parameters on JP2

Mean

Std Deviation

Skewness

Kurtosis

Contrast

uniformity

Entropy

Relative Smoothness

Fractal Dimension

Directionality

Image parameters on FITS (no clipping)

Mean

Std Deviation

Skewness

Kurtosis

Contrast

Uniformity

Entropy

Relative Smoothness

Fractal Dimension

Directionality

Image parameters on FITS (with clipping)

Mean

Std Deviation

Skewness

Kurtosis

Contrast

Uniformity

Entropy

Relative Smoothness

Fractal Dimension

Directionality




Correlation Study

The next step is to study the correlation between the ten image parameters on FITS and JP2 images. But before that, to have a better understanding of the data, we would need to plot the correlation of the raw data. On the right, you see the correlation of pixel values of FITS (X-axis) and JP2 (Y-axis) image where the values are normalized to [0,1].



The images used:

  • 2010_09_13__17_30_00_345__SDO_AIA_AIA_171.jp2
  • aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits


The logarithmic shape of the above plot should not surprise us any more, since we already observed the presence of the extreme values and the extremely skewed distribution in FITS images. And this is nicely illustrated in that plot; to a large range of values in FITS image (on X-axis), a very small range of values in JP2 image (on Y-axis) corresponds.

Thus, for a better comparison of these two image formats, we need to clip the far-out values. This is when the 99.5-th percentiles that we already calculated come in handy. We can use them as the thresholds for clipping the skewed distribution.

Below the distribution of the pixel values of a single image is illustrated. It is important to note that clipping of the first distribution plot, which is shown as the second plot,

  • 1st plot: Distribution in FITS image
  • 2nd plot: Distribution in clipped FITS image
  • 3rd plot: Distribution in JP2 image



The images used:

  • 2010_09_13__17_30_00_345__SDO_AIA_AIA_171.jp2
  • aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits








  • Raw data (FITS)
  • Number of bins: 16383/10









  • Clipped data (FITS)
  • Number of bins: 255










  • Raw data (JP2)
  • Number of bins: 255



The correlation plot after clipping of far-out values in FITS image, is now much more promising.

X-axis: Pixel values of FITS image

Y-axis: Pixel values of JP2 image




The images used:

  • 2010_09_13__17_30_00_345__SDO_AIA_AIA_171.jp2
  • aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits










Now that we see the linear correlation between the values of FITS and JP2 images, we can go on and produce the correlation plots for the ten image parameters on these two formats.

On the right, the procedure is listed.

Procedure (FITS):

  1. Read the FITS file (get image),
  2. Desaturate the color intensity based on the exposure time given at the header,
  3. Clip the image; get the squared root (or log10, for some other wavelengths) of values and copy any values outside of the clipping range [0, max(99.5-th p)] to max(99.5-th p) .
  4. Calculate the ten image parameters on the final image.
  5. Normalize the values to the range of [0,1]. The old Min and Max come from the 99.5-th percentile.

Procedure (JP2):

  1. Read the JP2 file,
  2. Calculate the ten image parameters for the final image.
  3. Normalize the values to the range of[0,1], The old Min and Max are [0,255].

--> Plot the correlation of FITS vs JP2.

Correlation of FITS vs JP2 in terms of the Ten Image Parameters

  • X-axis: Pixel values on FITS
  • Y-axis: Pixel values on JP2

Correlation on Standard Deviation (FITS - JP2)

dfd

Appendix

Collecting FITS files

For an active Sun:
  • aia.lev1_euv_12s.2010-09-13T173004Z.94.image_lev1.fits
  • aia.lev1_euv_12s.2010-09-13T173011Z.131.image_lev1.fits
  • aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits
  • 2010_09_13__17_30_00_345__SDO_AIA_AIA_171.jp2
  • 2010_09_13__17_30_02_125__SDO_AIA_AIA_94.jp2
  • 2010_09_13__17_30_33_625__SDO_AIA_AIA_131.jp2
For a more quiet Sun:
  • aia.lev1_euv_12s.2010-12-25T110735Z.131.image_lev1.fits
  • aia.lev1_euv_12s.2010-12-25T111028Z.94.image_lev1.fits
  • aia.lev1_euv_12s.2010-12-25T111137Z.171.image_lev1.fits
  • 2010_12_25__11_07_45_62__SDO_AIA_AIA_131.jp2
  • 2010_12_25__11_10_38_12__SDO_AIA_AIA_94.jp2
  • 2010_12_25__11_11_24_34__SDO_AIA_AIA_171.jp2



Correlation of parameters (FITS vs JP2)

[During a X-class flare]

  • 2010.09.13 at ~17:30
W: 171
W: 131
W: 94


Correlation of parameters (FITS vs JP2)

[During a more quiet time]

  • 2010.12.25 at ~11:10
W: 171
W: 131
W: 94

Experiment: Finding the MIN and MAX intensity values on FITS files

Data SERIES: AIA.Lev1_EUV_12s - STARTING TIME: 2010-09-13 T 17:30:04 - PERIOD: 30 Days - CADENCE: 2hs WAVELENGTH: 94 A

At 2010-09-13 T 17:30:04 the MAX value reached its maximum possible value, i.e., 16383 .


Query data using JSOC web-app:


Using JSOC front-end, I requested AIA images (FITS), starting from 2010.09.01, for a period of 30 days (/30d), for every 2 hours (/@2h).

The query would be:

QUERY: aia.lev1_euv_12s[2010.09.01_13:30/30d@2h][? WAVELNTH = 94 ?]{image}


Download from the given URL:

The data would be accessible through a URL. To download the files, one of the two methods can be used:

  • Using an R script, I downloaded all the images:
library(rvest)
library(XML)
myUrl <- 'http://jsoc.stanford.edu/SUM86/D1012904466/S00000/'
destDir <- 'FITS_94'

pg <- htmlParse(myUrl)
links <- xpathSApply(pg, "//a/@href")
free(pg)

for(i in 1:length(links)){
  fileName <- basename(links[[i]])
  download.file(links[[i]], destfile = paste(destDir, fileName, sep = "/"))
  print(paste(i, length(links), sep = "/"))
}
  • Using wget command (for Unix base OSs)
wget --accept fits --mirror --page-requisites --adjust-extension --convert-links --backup-converted --no-parent http://jsoc.stanford.edu/SUM86/D1012904466/S00000/

Query data using JSOC API:


Using DRMS Python module (see JSOC user guide), we created a python script to download FITS files. This script, creates a query for a period of time, and requests a tar package of the images.

The arguments to execute the script are as follows:

arg[1]:mode, arg[2]:start time, arg[3]:end time, arg[4]: wavelength

Example:

python download_aia.py 1 2012-01-01 2012-08-31 335

and the response would be:

> aia.lev1_euv_12s[2012-01-01T00:00:00_TAI-2012-08-31T23:59:50_TAI@1h][335]
>    Total records found:  8726

On the right, there is the meta data for what was downloaded for the evaluation experiments.

Note: Depending on how large the requested query is, the response time may vary. When one year of data (1h cadence) is queried (~9000 fits files), the response time may be as long as 30 minutes. This query keep checking till the download URI is created.

Note: There is a 100GB limit for one query. So, one year of data, cannot be downloaded at once (for most of the wavelengths). So, you may want to break it into 2 smaller queries.

Copy of Meta_Download_FTIS_2012

[1] Boerner, Paul, et al. "Initial calibration of the atmospheric imaging assembly (AIA) on the solar dynamics observatory (SDO)." The Solar Dynamics Observatory. Springer US, 2011. 41-66.


java -Xmx400g -jar paramsinvest_exp1_Entropy.jar 2> errorOutput.log > output.log

Output Lines: 12835 (6416 images) --- START TIME: Thu Mar 08 19:19:29 UTC 2018 --- END TIME: Thu Mar 08 21:57:14 UTC 2018

java -Xmx400g -jar paramsinvest_exp1_Uniformity.jar 2> errorOutput.log > output.log

Output Lines: 12835 (6416 images) --- START TIME: Thu Mar 08 23:25:08 UTC 2018 --- END TIME: Fri Mar 09 01:59:32 UTC 2018

Correlation of the ten image parameters on FITS vs JP2