The following is the report of my investigation on FITS files, conversion of them using Java libraries, and a comparison with JPEG2000 format.
Introduction
While maintaining the data in such a high precision level during this translation process could be vital for some analytical studies of the solar activities, it is not trivial that for any other task, such precision is beneficial. Note that processing a reasonably large amount of such images (e.g. 6-year worth of SDO images) could be computationally overwhelming and keep TBs of CPUs busy for a couple of months. For this very reason, beside the typical down-sizing steps (dimensionality reduction) during the preprocessing of the data, it is crucial to learn whether or not working on JP2 images as the raw input would make any significant difference. Of course, the answer to this question depends on the way the data is going to be extracted form the images.
To have a better understanding regarding the difficulty of dealing with large size images, note that the typical size of a 4096 X 4096 px AIA FITS image varies from 7M to 14M depending on the waveband the image were taken in. The size of a similar image in JP2 format, on the other hand, is about 1M across all different wavebands. One of the typical experiments that we would need to run on these images, is to extract some image parameters with different settings and use other domain (solar) data to decide which are the best settings for a particular task. In our last study, there were 60 different settings to be experimented. Since we have 10 different wavebands, we need to extract each image 600 times. A reliable experiment typically takes up at least one month worth of data. Depending on the particular event we are looking for, this could result in thousands of images ( ~4000 in that case). This means processing 2400000 images, which is equal to 24TB of data (for the FITS file of size on average 10MB ) instead of 2.4TB for JP2 images.
What FITS is?
In 1981, FITS format was introduced for the interchange of astronomical images (and other digital arrays) on magnetic tape as a solution to the data transport problem. FITS is a data format for recoding digital images on seven-track and nine-track magnetic tape.
"The AIA cameras use a 14-bit analog-to-digital converter (ADC) to translate the charge read out from each pixel to a digital number [DN] value" [1]. So, the maximum pixel value for AIA images is 2^17 = 16384. This value may be observed on the extremely bright regions of the SDO images, mainly during the X-class flares. Since the values in the FITS file correspond to the number of photons hit the CCDs of the SDO camera, they cannot be negative, except after processing and using interpolation. Those negative values should be cleaned out (replaced with zeros).
Quick read and analysis of FITS files:
Working with FITS files as text could be a real hassle, because of its size. It took me a while to find out that there is a very handy app that gives you much more than the content of the header(s).
SAOImage DS9, is a free application, initially developed by Mike Van Hilst, in 1990 and then continued by others. Using this application, the header(s), several different visualization of the image, histogram, etc can be quickly provided for the user.
sudo apt-get install saods9
Reading the image of a FITS file
Below, there are two snippets of Java code for reading FITS files, using nom.tam FITS library (version: 1.15.2).
For uncompressed FITS files:
Fits f = new Fits(DIR_TO_FITS)
f.read();
ImageHDU hdu = (ImageHDU) f.getHDU(1);
int[][] im = (int[][]) hdu.getKernel();
For compressed FITS files:
Fits f = new Fits(DIR_TO_FITS)
f.readHDU();
CompressedImageHDU chdu = (CompressedImageHDU) f.readHDU();
ImageHDU uncompressedImage = this.chdu.asImageHDU();
ImageData imageData = uncompressedImage.getData();
short[][] im = (short[][]) imageData.getData();
Reading cards from the header
Each FITS file has one or more headers that end with the keyword END. Below you can see a few lines of the header of an AIA image:
BITPIX = 8 / 8-bit bytes
NAXIS = 2 / 2-dimensional binary table
NAXIS1 = 8 / width of table in bytes
NAXIS2 = 4096 / number of rows in table
PCOUNT = 11473606 / size of special data area
GCOUNT = 1 / one data group (required keyword)
Each of these lines is called a card. A card has a key, followed by a value, and a description. The following snippet of Java code show how they can be accessed:
headerData = new ImageDBFitsHeaderData();
Header header = hdu.getHeader();
double naxis1 = header.getDoubleValue("NAXIS1");
> NAXIS1: 8
Among the many different pieces of information fits header provides, there are DATAMIN and DATAMAX. In the documentation of fits file it is written that these are the valid minimum and maximum values. As I investigated, these values are the actual minimum and maximum values in the array of the pixel values, which are compared on the right.
Some results -->
As it is shown, the maximum possible value for each cell of the image is 16383 which is the largest 2-byte integer where for each byte, one bit is reserved for the sign. So for 2 bytes, 2*(8-1) bits are used, and this means the largest integer would be 2 ^ 14 = 16384.
On the other hand, for the minimum values, negative numbers are observed while the minimum value must be zero. These negative values must be treated as noise and are generated in the process of interpolation in the raw-to-level1 post processing step.
> aia.lev1_euv_12s.2017-09-06T084207Z.304.image_lev1.fits
actualMax:2528.0 validMax: 2528.0
actualMin:-7.0 validMin: -7.0
> aia.lev1_euv_12s.2017-09-06T085407Z.304.image_lev1.fits
actualMax:2934.0 validMax: 2934.0
actualMin:-13.0 validMin: -13.0
> aia.lev1_euv_12s.2017-09-06T093010Z.171.image_lev1.fits
actualMax:16383.0 validMax: 16383.0
actualMin:-6.0 validMin: -6.0
> aia.lev1_euv_12s.2017-09-06T093001Z.94.image_lev1.fits
actualMax:15732.0 validMax: 15732.0
actualMin:-8.0 validMin: -8.0
> aia.lev1_euv_12s.2017-09-06T084201Z.94.image_lev1.fits
actualMax:2304.0 validMax: 2304.0
actualMin:-7.0 validMin: -7.0
> aia.lev1_euv_12s.2017-09-06T084208Z.131.image_lev1.fits
actualMax:5706.0 validMax: 5706.0
actualMin:-205.0 validMin: -205.0
> aia.lev1_euv_12s.2017-09-06T091801Z.94.image_lev1.fits
actualMax:16383.0 validMax: 16383.0
actualMin:-10.0 validMin: -10.0
> aia.lev1_euv_12s.2017-09-06T091759Z.211.image_lev1.fits
actualMax:16383.0 validMax: 16383.0
actualMin:-17.0 validMin: -17.0
> aia.lev1_euv_12s.2017-09-06T085408Z.131.image_lev1.fits
actualMax:4386.0 validMax: 4386.0
actualMin:-208.0 validMin: -208.0
> aia.lev1_euv_12s.2017-09-06T090608Z.304.image_lev1.fits
actualMax:15749.0 validMax: 15749.0
actualMin:-13.0 validMin: -13.0
> aia.lev1_euv_12s.2017-09-06T091811Z.131.image_lev1.fits
actualMax:7228.0 validMax: 7228.0
actualMin:-227.0 validMin: -227.0
> aia.lev1_euv_12s.2017-09-06T093002Z.335.image_lev1.fits
actualMax:3262.0 validMax: 3262.0
actualMin:-163.0 validMin: -163.0
> aia.lev1_euv_12s.2017-09-06T090610Z.131.image_lev1.fits
actualMax:14939.0 validMax: 14939.0
actualMin:-227.0 validMin: -227.0
Output Test
[aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits]
To ensure that the reader code works properly, I created the heatmap of the FITS file I queried from JSOC, and compared it with the same snapshot that I downloaded from SDO repository.
SDO image in JP2 format
Heatmap of fits format
A 3D visual comparison
In these visualizations, the following AIA image was queried in two formats, FITS (from JSOC) and JP2 (from SDO):
aia.lev1_euv_12s.2010-09-13T173013Z.171
This particular image is chosen because of the presence of an X class flare at this time period. Both of the images are rescaled to 1024 pixels (from 4096X4096) to reduce the size of the 3D plot. No normalization is applied to their values.
In the original FITS file, the maximum value was 16383 that due to rescaling of the image size (re-sampling the values) it changed to 9443.
In the second visualization, the JP2 format is plotted, so the range is [0, 255] .
For the third plot, a clipping method is used by setting the new min and max to [0, 44] and (for this wavelength;171), replacing each cell with its squared, and eventually cutting of all values that are still out side of the new range. This range is borrowed from the Helioviewer's setting.
3D visualization of the FITS format
(Size: ~80MB - Be patient.)
3D visualization of the JP2 format
(Size: ~80MB - Be patient.)
3D visualization after clipping the extreme values.
(Size: ~80MB - Be patient.)
Distribution of Values
Below, it is shown how skewed the distribution of color intensities is in FITS format.
X: Color intensity values
Y: Frequency on FITS file
[TOP]
MAX: 9443
95-th percentile: 665
[BOTTOM]
MAX: 14412
95-th percentile: 476
aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits
aia.lev1_euv_12s.2017-09-06T84210Z.171.image_lev1.fits
Max and Min intensity values of fits files, across wavelengths
While observing different AIA images (in FITS format), I noticed that the range of the values is not consistent across different wavelengths. For instance, while in 171 Angstroms, values as large as the maximum possible value (2^14) (i.e., 16383) can be seen at the center of the flares, in 94 angstroms, the range is mostly limited to 2-digit numbers. This raises two questions:
What is the MIN and MAX values of each wavelength, across a long period of time?
On the right, the results of my experiments are shown. As one can see, regardless of the wavelength, 0 and 16838 are recorded. In other words, it seems to be true if we say the maximum possible values will be recorded, at least in the presence of X-class flares, regardless of the (EUV) wavelength.
But is this a rare observation that took place because of the X-class flare? (See next question.)
How often the MIN and MAX values are recorded in a certain period of time?
[] Global MIN and MAX values
DATA: Using JSOC front-end, I requested AIA FITS files, starting from 2010.09.01, for a period of 30 days (/30d), for every 2 hours (/@2h).
QUERY: aia.lev1_euv_12s[2010.09.01_13:30/30d@2h][? WAVELNTH = 94 ?]{image}
the global min and max values are listed below:
94 A --> MIN:[0.0] MAX:[16383.0]
131 A --> MIN:[0.0] MAX:[16383.0]
171 A --> MIN:[0.0] MAX:[16383.0]
193 A --> MIN:[0.0] MAX:[16383.0]
211 A --> MIN:[0.0] MAX:[16383.0]
304 A --> MIN:[0.0] MAX:[16383.0]
It is interesting to study the changed of several different percentiles of pixel intensities for a period of one month, as illustrated on the right. Note that in each of these plots, the intensities as extreme as 16383 are observed, however, these values are not event caught in the 99.5-th percentile of the distributions.
The spike in the first plot corresponds to: aia.lev1_euv_12s.2010-09-30T153007Z.94.image_lev1.fits (see) where several bright regions are recorded which are relatively very strong compared to the snapshots 2 hours before and after.
Max p-th percentile of pixel values for full image
The table on the right, is the summary of the above plots. For each of the nine wavelength (94, 131, 171, 193, 211, 304, 335, 1600, 1700 Ă…), the maximum values of the percentiles are shown.
Ex. For wavelength of 131 Ă…ngstroms, 99% of the intensity values are below 88.
It is important to note that for 99.5-th percentile, we are evaluating our data after cutting 0.5% of the points, while we should keep in mind that this small piece is still comprised of 83,886 pixel values.
Max p-th percentile of pixel values within the Sun's disk
This table shows the percentile of pixel values, where all pixels outside the disk of Sun are ignored.
The location of disk's center, as well as the radius is obtained from each file's header.
80.th 90.th 95.th 99.th 99.5.th 100.th
W: 94 7 10 15 34 44 16383
W:131 19 30 43 88 123 16383
W:171 568 777 1034 1935 2602 16383
W:193 574 904 1354 2884 3968 16383
W:211 154 258 429 1159 1673 16383
W:304 116 151 188 327 431 16383
W:335 16 26 43 171 305 16383
W:1600 196 242 289 427 509 16046
W:1700 1801 2205 2558 3517 4138 16215
80.th 90.th 95.th 99.th 99.5.th 100.th
W: 94 8 11 15 36 54 16383
W:131 25 36 49 103 150 14945
W:171 690 892 1113 2163 2773 16383
W:193 666 1011 1542 3342 4459 16383
W:211 188 312 557 1413 1919 16383
W:304 153 190 234 441 599 16383
W:335 20 31 55 265 417 14865
W:1600 247 294 348 521 629 16013
W:1700 2253 2604 2978 4236 5159 16215
Several animated plots showing the distribution of colors in FITS files
[Animated plots: https://grid.cs.gsu.edu/~aahmadzadeh1/plots/MyGIFPage.html]
[Zoomed animated plots: https://grid.cs.gsu.edu/~aahmadzadeh1/plots/MyGIFPage_Zoomed.html]
Below, distribution of FITS's color intensity for values higher than the 99.5-th percentile is shown. This distribution is the average obtained by observing one month worth of data (~354 fits file).
Note: The vertical dotted line is draws on the 99.5-th percentile, for each wavelength.
Note: The Y-axis is limited to the range [0,50] for a better visualization.
Summary of distributions of above the 99.5-th percentile values
Image parameters on JP2
Mean
Std Deviation
Skewness
Kurtosis
Contrast
uniformity
Entropy
Relative Smoothness
Fractal Dimension
Directionality
Image parameters on FITS (no clipping)
Mean
Std Deviation
Skewness
Kurtosis
Contrast
Uniformity
Entropy
Relative Smoothness
Fractal Dimension
Directionality
Image parameters on FITS (with clipping)
Mean
Std Deviation
Skewness
Kurtosis
Contrast
Uniformity
Entropy
Relative Smoothness
Fractal Dimension
Directionality
Correlation Study
The next step is to study the correlation between the ten image parameters on FITS and JP2 images. But before that, to have a better understanding of the data, we would need to plot the correlation of the raw data. On the right, you see the correlation of pixel values of FITS (X-axis) and JP2 (Y-axis) image where the values are normalized to [0,1].
The images used:
2010_09_13__17_30_00_345__SDO_AIA_AIA_171.jp2
aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits
The logarithmic shape of the above plot should not surprise us any more, since we already observed the presence of the extreme values and the extremely skewed distribution in FITS images. And this is nicely illustrated in that plot; to a large range of values in FITS image (on X-axis), a very small range of values in JP2 image (on Y-axis) corresponds.
Thus, for a better comparison of these two image formats, we need to clip the far-out values. This is when the 99.5-th percentiles that we already calculated come in handy. We can use them as the thresholds for clipping the skewed distribution.
Below the distribution of the pixel values of a single image is illustrated. It is important to note that clipping of the first distribution plot, which is shown as the second plot,
1st plot: Distribution in FITS image
2nd plot: Distribution in clipped FITS image
3rd plot: Distribution in JP2 image
The images used:
2010_09_13__17_30_00_345__SDO_AIA_AIA_171.jp2
aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits
Raw data (FITS)
Number of bins: 16383/10
Clipped data (FITS)
Number of bins: 255
Raw data (JP2)
Number of bins: 255
The correlation plot after clipping of far-out values in FITS image, is now much more promising.
X-axis: Pixel values of FITS image
Y-axis: Pixel values of JP2 image
The images used:
2010_09_13__17_30_00_345__SDO_AIA_AIA_171.jp2
aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits
Now that we see the linear correlation between the values of FITS and JP2 images, we can go on and produce the correlation plots for the ten image parameters on these two formats.
On the right, the procedure is listed.
Procedure (FITS):
Read the FITS file (get image),
Desaturate the color intensity based on the exposure time given at the header,
Clip the image; get the squared root (or log10, for some other wavelengths) of values and copy any values outside of the clipping range [0, max(99.5-th p)] to max(99.5-th p) .
Calculate the ten image parameters on the final image.
Normalize the values to the range of [0,1]. The old Min and Max come from the 99.5-th percentile.
Procedure (JP2):
Read the JP2 file,
Calculate the ten image parameters for the final image.
Normalize the values to the range of[0,1], The old Min and Max are [0,255].
--> Plot the correlation of FITS vs JP2.
Correlation of FITS vs JP2 in terms of the Ten Image Parameters
X-axis: Pixel values on FITS
Y-axis: Pixel values on JP2
dfd
Appendix
Collecting FITS files
For an active Sun:
aia.lev1_euv_12s.2010-09-13T173004Z.94.image_lev1.fits
aia.lev1_euv_12s.2010-09-13T173011Z.131.image_lev1.fits
aia.lev1_euv_12s.2010-09-13T173013Z.171.image_lev1.fits
2010_09_13__17_30_00_345__SDO_AIA_AIA_171.jp2
2010_09_13__17_30_02_125__SDO_AIA_AIA_94.jp2
2010_09_13__17_30_33_625__SDO_AIA_AIA_131.jp2
For a more quiet Sun:
aia.lev1_euv_12s.2010-12-25T110735Z.131.image_lev1.fits
aia.lev1_euv_12s.2010-12-25T111028Z.94.image_lev1.fits
aia.lev1_euv_12s.2010-12-25T111137Z.171.image_lev1.fits
2010_12_25__11_07_45_62__SDO_AIA_AIA_131.jp2
2010_12_25__11_10_38_12__SDO_AIA_AIA_94.jp2
2010_12_25__11_11_24_34__SDO_AIA_AIA_171.jp2
Experiment: Finding the MIN and MAX intensity values on FITS files
Data SERIES: AIA.Lev1_EUV_12s - STARTING TIME: 2010-09-13 T 17:30:04 - PERIOD: 30 Days - CADENCE: 2hs WAVELENGTH: 94 A
At 2010-09-13 T 17:30:04 the MAX value reached its maximum possible value, i.e., 16383 .
Query data using JSOC web-app:
Using JSOC front-end, I requested AIA images (FITS), starting from 2010.09.01, for a period of 30 days (/30d), for every 2 hours (/@2h).
The query would be:
QUERY: aia.lev1_euv_12s[2010.09.01_13:30/30d@2h][? WAVELNTH = 94 ?]{image}
Download from the given URL:
The data would be accessible through a URL. To download the files, one of the two methods can be used:
Using an R script, I downloaded all the images:
library(rvest)
library(XML)
myUrl <- 'http://jsoc.stanford.edu/SUM86/D1012904466/S00000/'
destDir <- 'FITS_94'
pg <- htmlParse(myUrl)
links <- xpathSApply(pg, "//a/@href")
free(pg)
for(i in 1:length(links)){
fileName <- basename(links[[i]])
download.file(links[[i]], destfile = paste(destDir, fileName, sep = "/"))
print(paste(i, length(links), sep = "/"))
}
Using wget command (for Unix base OSs)
wget --accept fits --mirror --page-requisites --adjust-extension --convert-links --backup-converted --no-parent http://jsoc.stanford.edu/SUM86/D1012904466/S00000/
Query data using JSOC API:
Using DRMS Python module (see JSOC user guide), we created a python script to download FITS files. This script, creates a query for a period of time, and requests a tar package of the images.
The arguments to execute the script are as follows:
arg[1]:mode, arg[2]:start time, arg[3]:end time, arg[4]: wavelength
Example:
python download_aia.py 1 2012-01-01 2012-08-31 335
and the response would be:
> aia.lev1_euv_12s[2012-01-01T00:00:00_TAI-2012-08-31T23:59:50_TAI@1h][335]
> Total records found: 8726
On the right, there is the meta data for what was downloaded for the evaluation experiments.
Note: Depending on how large the requested query is, the response time may vary. When one year of data (1h cadence) is queried (~9000 fits files), the response time may be as long as 30 minutes. This query keep checking till the download URI is created.
Note: There is a 100GB limit for one query. So, one year of data, cannot be downloaded at once (for most of the wavelengths). So, you may want to break it into 2 smaller queries.
[1] Boerner, Paul, et al. "Initial calibration of the atmospheric imaging assembly (AIA) on the solar dynamics observatory (SDO)." The Solar Dynamics Observatory. Springer US, 2011. 41-66.
Helioviewer Project in Lunchpad: https://launchpad.net/helioviewer
JP2Gen Project in Lunchpad: https://launchpad.net/jp2gen
Clipping Code in JP2Gen:
java -Xmx400g -jar paramsinvest_exp1_Entropy.jar 2> errorOutput.log > output.log
Output Lines: 12835 (6416 images) --- START TIME: Thu Mar 08 19:19:29 UTC 2018 --- END TIME: Thu Mar 08 21:57:14 UTC 2018
java -Xmx400g -jar paramsinvest_exp1_Uniformity.jar 2> errorOutput.log > output.log
Output Lines: 12835 (6416 images) --- START TIME: Thu Mar 08 23:25:08 UTC 2018 --- END TIME: Fri Mar 09 01:59:32 UTC 2018