Quality control of dropsonde data#

This is outdated

Let’s see how we can go about checking the quality of the data in ASPEN-processed files.

First we import the necessary modules

[1]:

from halodrops import sonde
from halodrops.helper import paths
from halodrops.qc import profile

We will go about checking the QC for all ASPEN-processed files from the HALO flight on 1st April, 2022. First, we get a dictionary of all sondes in the flight. It will be called Sondes, its keys will be sonde-IDs and their values will be corresponding instances of the Sonde class.

[2]:

data_directory = '/Users/geet/Documents/Repositories/Owned/halodrops/sample/'
flight_id = '20220401'

# Instantiate paths object
f0401 = paths.Paths(data_directory,flight_id)
# Create Sondes dictionary
Sondes = f0401.populate_sonde_instances()

The post-ASPEN file for 213450447 with filename D20220401_101259QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213341449 with filename D20220401_093402QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213450599 with filename D20220401_125710QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213010063 with filename D20220401_101634QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 210440276 with filename D20220401_124541QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute

Let’s start by looking at data from one sonde from the flight.

[3]:

ds = Sondes['210430717'].aspen_ds

First, we’ll check the profile fullness of the u_wind variable.

The profile fullness (or profile coverage) is the fraction of timestamps that have data. Therefore, a variable that provides measurements at every timestamp (i.e. timestamps are the coordinates of the independent time dimension) would have a value of 1.

[4]:

var = 'u_wind'
print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')

0.93: Profile coverage of u_wind

That’s nice. This means that 93% of the timestamps in the dataset have a non-NaN measurement of u_wind associated with them. Now, let’s check for tdry, which is the dry air temperature.

[5]:

var = 'tdry'
print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')

0.48: Profile coverage of tdry

Oof! That looks bad. That’s almost half of the coverage for u_wind.

But, there’s a catch. The temperature sensor in the RD-41 sonde has a sampling frequency of 2 Hz, whereas the GPS sensor (from where the horizontal winds are derived) has a sampling frequency of 4 Hz. The time-coordinates are the same for both variables and are spaced every 0.25 seconds, which aligns exactly with the GPS sensor frequency. Therefore, it is a bit unfair to compare the fraction of temperature values against all time coordinates, given that it is not supposed to be measuring so frequently. A better way would be to compare the profile-coverage weighted for the sampling frequency.

So, if the temperature sensor has to measure at every other time-coordinate, then it’s profile-coverage should be taken only for half the time-coordinates, or simply multiplied by two. This is exactly what the weighted_fullness ../apidocs/halodrops/halodrops.qc.profile.md#halodrops.qc.profile.weighted_fullness function does.

I think this function does not exist anymore

[6]:

var = 'tdry'
sampling_frequency = 2 # in hertz
print(f'{profile.weighted_fullness(ds,var,sampling_frequency):.02f}: Profile coverage of {var}')

0.97: Profile coverage of tdry

Now, that doesn’t look too bad, does it? It’s actually performing better than the u_wind variable, accounting for sensor sampling frequencies.

To make it easy, HALO-DROPS includes a CONFIG file (rd41.CONFIG) which provides the sampling frequencies of all measured and several estimated variables. Let’s use this CONFIG to check out how our sonde has performed.

[7]:

# Reading the CONFIG file
import configparser
config = configparser.ConfigParser()
config_file_path = '../../../src/halodrops/helper/rd41.CONFIG'
config.read(config_file_path)

[7]:

['../../../src/halodrops/helper/rd41.CONFIG']

[8]:

# Create a list of tuples with variable names and their corresponding sampling frequency
vars = [(var,int(config['sampling - frequencies'][var])) for var in config['sampling - frequencies'].keys()]

for var in vars:
    print('---')
    print(f'{profile.fullness(ds,var[0]):.02f}: Profile coverage of {var[0]}')
    print(f'{profile.weighted_fullness(ds,var[0],var[1]):.02f}: Weighted profile coverage of {var[0]}')

---
0.48: Profile coverage of pres
0.96: Weighted profile coverage of pres
---
0.48: Profile coverage of tdry
0.97: Weighted profile coverage of tdry
---
0.44: Profile coverage of dp
0.89: Weighted profile coverage of dp
---
0.44: Profile coverage of rh
0.89: Weighted profile coverage of rh
---
0.93: Profile coverage of u_wind
0.93: Weighted profile coverage of u_wind
---
0.93: Profile coverage of v_wind
0.93: Weighted profile coverage of v_wind
---
0.93: Profile coverage of wspd
0.93: Weighted profile coverage of wspd
---
0.93: Profile coverage of wdir
0.93: Weighted profile coverage of wdir
---
0.44: Profile coverage of mr
0.89: Weighted profile coverage of mr

But it would be better to have these within the dataset itself. So, we do that with a function which returns these weighted profile coverage values as variables in the dataset.

[9]:

ds_with_weighted_fullness = profile.weighted_fullness_for_config_vars(ds,config_file_path=config_file_path)

[10]:

# Check out the last 9 variables that we added to the dataset
ds_with_weighted_fullness