Quality control of dropsonde data

Quality control of dropsonde data#

This is outdated

Let’s see how we can go about checking the quality of the data in ASPEN-processed files.

First we import the necessary modules

[1]:
from halodrops import sonde
from halodrops.helper import paths
from halodrops.qc import profile

We will go about checking the QC for all ASPEN-processed files from the HALO flight on 1st April, 2022. First, we get a dictionary of all sondes in the flight. It will be called Sondes, its keys will be sonde-IDs and their values will be corresponding instances of the Sonde class.

[2]:
data_directory = '/Users/geet/Documents/Repositories/Owned/halodrops/sample/'
flight_id = '20220401'

# Instantiate paths object
f0401 = paths.Paths(data_directory,flight_id)
# Create Sondes dictionary
Sondes = f0401.populate_sonde_instances()
The post-ASPEN file for 213450447 with filename D20220401_101259QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213341449 with filename D20220401_093402QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213450599 with filename D20220401_125710QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213010063 with filename D20220401_101634QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 210440276 with filename D20220401_124541QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute

Let’s start by looking at data from one sonde from the flight.

[3]:
ds = Sondes['210430717'].aspen_ds

First, we’ll check the profile fullness of the u_wind variable.

The profile fullness (or profile coverage) is the fraction of timestamps that have data. Therefore, a variable that provides measurements at every timestamp (i.e. timestamps are the coordinates of the independent time dimension) would have a value of 1.

[4]:
var = 'u_wind'
print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')
0.93: Profile coverage of u_wind

That’s nice. This means that 93% of the timestamps in the dataset have a non-NaN measurement of u_wind associated with them. Now, let’s check for tdry, which is the dry air temperature.

[5]:
var = 'tdry'
print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')
0.48: Profile coverage of tdry

Oof! That looks bad. That’s almost half of the coverage for u_wind.

But, there’s a catch. The temperature sensor in the RD-41 sonde has a sampling frequency of 2 Hz, whereas the GPS sensor (from where the horizontal winds are derived) has a sampling frequency of 4 Hz. The time-coordinates are the same for both variables and are spaced every 0.25 seconds, which aligns exactly with the GPS sensor frequency. Therefore, it is a bit unfair to compare the fraction of temperature values against all time coordinates, given that it is not supposed to be measuring so frequently. A better way would be to compare the profile-coverage weighted for the sampling frequency.

So, if the temperature sensor has to measure at every other time-coordinate, then it’s profile-coverage should be taken only for half the time-coordinates, or simply multiplied by two. This is exactly what the weighted_fullness ../apidocs/halodrops/halodrops.qc.profile.md#halodrops.qc.profile.weighted_fullness function does.

I think this function does not exist anymore

[6]:
var = 'tdry'
sampling_frequency = 2 # in hertz
print(f'{profile.weighted_fullness(ds,var,sampling_frequency):.02f}: Profile coverage of {var}')
0.97: Profile coverage of tdry

Now, that doesn’t look too bad, does it? It’s actually performing better than the u_wind variable, accounting for sensor sampling frequencies.

To make it easy, HALO-DROPS includes a CONFIG file (rd41.CONFIG) which provides the sampling frequencies of all measured and several estimated variables. Let’s use this CONFIG to check out how our sonde has performed.

[7]:
# Reading the CONFIG file
import configparser
config = configparser.ConfigParser()
config_file_path = '../../../src/halodrops/helper/rd41.CONFIG'
config.read(config_file_path)
[7]:
['../../../src/halodrops/helper/rd41.CONFIG']
[8]:
# Create a list of tuples with variable names and their corresponding sampling frequency
vars = [(var,int(config['sampling - frequencies'][var])) for var in config['sampling - frequencies'].keys()]

for var in vars:
    print('---')
    print(f'{profile.fullness(ds,var[0]):.02f}: Profile coverage of {var[0]}')
    print(f'{profile.weighted_fullness(ds,var[0],var[1]):.02f}: Weighted profile coverage of {var[0]}')
---
0.48: Profile coverage of pres
0.96: Weighted profile coverage of pres
---
0.48: Profile coverage of tdry
0.97: Weighted profile coverage of tdry
---
0.44: Profile coverage of dp
0.89: Weighted profile coverage of dp
---
0.44: Profile coverage of rh
0.89: Weighted profile coverage of rh
---
0.93: Profile coverage of u_wind
0.93: Weighted profile coverage of u_wind
---
0.93: Profile coverage of v_wind
0.93: Weighted profile coverage of v_wind
---
0.93: Profile coverage of wspd
0.93: Weighted profile coverage of wspd
---
0.93: Profile coverage of wdir
0.93: Weighted profile coverage of wdir
---
0.44: Profile coverage of mr
0.89: Weighted profile coverage of mr

But it would be better to have these within the dataset itself. So, we do that with a function which returns these weighted profile coverage values as variables in the dataset.

[9]:
ds_with_weighted_fullness = profile.weighted_fullness_for_config_vars(ds,config_file_path=config_file_path)
[10]:
# Check out the last 9 variables that we added to the dataset
ds_with_weighted_fullness
[10]:
<xarray.Dataset>
Dimensions:                   (time: 2727, obs: 1)
Coordinates:
  * time                      (time) datetime64[ns] 2022-04-01T12:32:26.78997...
    lat                       (time) float32 ...
    lon                       (time) float32 ...
    gpsalt                    (time) float32 ...
Dimensions without coordinates: obs
Data variables: (12/36)
    trajectory                |S1 ...
    launch_time               datetime64[ns] ...
    pres                      (time) float32 1.023e+03 nan 1.022e+03 ... nan nan
    tdry                      (time) float32 -23.02 nan -23.06 ... nan nan nan
    dp                        (time) float32 -25.19 nan -25.21 ... nan nan nan
    rh                        (time) float32 82.75 nan 82.87 nan ... nan nan nan
    ...                        ...
    rh_weighted_fullness      float64 0.8896
    u_wind_weighted_fullness  float64 0.9318
    v_wind_weighted_fullness  float64 0.9318
    wspd_weighted_fullness    float64 0.9318
    wdir_weighted_fullness    float64 0.9318
    mr_weighted_fullness      float64 0.8896
Attributes: (12/91)
    Conventions:            CF-1.6
    RepoRevision:           V3.4.4
    RepoLastChangedDate:    Fri May 1 14:20:30 2020 -0600
    RepoId:                 2c0e825cc03af2932104c9a128eae846c428bc6c
    RepoBranch:             master
    featureType:            trajectory
    ...                     ...
    WindQCDev:              999
    WindQCWL:               30
    WindSats:               4
    WindSmoothWL:           10
    WindVVPresWL:           5
    WindVVdelta:            2.5