Quality control of dropsonde data#
This is outdated
Let’s see how we can go about checking the quality of the data in ASPEN-processed files.
First we import the necessary modules
[1]:
from halodrops import sonde
from halodrops.helper import paths
from halodrops.qc import profile
We will go about checking the QC for all ASPEN-processed files from the HALO flight on 1st April, 2022. First, we get a dictionary of all sondes in the flight. It will be called Sondes, its keys will be sonde-IDs and their values will be corresponding instances of the Sonde class.
[2]:
data_directory = '/Users/geet/Documents/Repositories/Owned/halodrops/sample/'
flight_id = '20220401'
# Instantiate paths object
f0401 = paths.Paths(data_directory,flight_id)
# Create Sondes dictionary
Sondes = f0401.populate_sonde_instances()
The post-ASPEN file for 213450447 with filename D20220401_101259QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213341449 with filename D20220401_093402QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213450599 with filename D20220401_125710QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 213010063 with filename D20220401_101634QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
The post-ASPEN file for 210440276 with filename D20220401_124541QC.nc does not exist. Therefore, I am not setting the `postaspenfile` attribute.
I didn't find the `postaspenfile` attribute, therefore I am not storing the xarray dataset as an attribute
Let’s start by looking at data from one sonde from the flight.
[3]:
ds = Sondes['210430717'].aspen_ds
First, we’ll check the profile fullness of the u_wind variable.
The profile fullness (or profile coverage) is the fraction of timestamps that have data. Therefore, a variable that provides measurements at every timestamp (i.e. timestamps are the coordinates of the independent time dimension) would have a value of 1.
[4]:
var = 'u_wind'
print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')
0.93: Profile coverage of u_wind
That’s nice. This means that 93% of the timestamps in the dataset have a non-NaN measurement of u_wind associated with them. Now, let’s check for tdry, which is the dry air temperature.
[5]:
var = 'tdry'
print(f'{profile.fullness(ds,var):.02f}: Profile coverage of {var}')
0.48: Profile coverage of tdry
Oof! That looks bad. That’s almost half of the coverage for u_wind.
But, there’s a catch. The temperature sensor in the RD-41 sonde has a sampling frequency of 2 Hz, whereas the GPS sensor (from where the horizontal winds are derived) has a sampling frequency of 4 Hz. The time-coordinates are the same for both variables and are spaced every 0.25 seconds, which aligns exactly with the GPS sensor frequency. Therefore, it is a bit unfair to compare the fraction of temperature values against all time coordinates, given that it is not supposed to be measuring so frequently. A better way would be to compare the profile-coverage weighted for the sampling frequency.
So, if the temperature sensor has to measure at every other time-coordinate, then it’s profile-coverage should be taken only for half the time-coordinates, or simply multiplied by two. This is exactly what the weighted_fullness ../apidocs/halodrops/halodrops.qc.profile.md#halodrops.qc.profile.weighted_fullness function does.
I think this function does not exist anymore
[6]:
var = 'tdry'
sampling_frequency = 2 # in hertz
print(f'{profile.weighted_fullness(ds,var,sampling_frequency):.02f}: Profile coverage of {var}')
0.97: Profile coverage of tdry
Now, that doesn’t look too bad, does it? It’s actually performing better than the u_wind variable, accounting for sensor sampling frequencies.
To make it easy, HALO-DROPS includes a CONFIG file (rd41.CONFIG) which provides the sampling frequencies of all measured and several estimated variables. Let’s use this CONFIG to check out how our sonde has performed.
[7]:
# Reading the CONFIG file
import configparser
config = configparser.ConfigParser()
config_file_path = '../../../src/halodrops/helper/rd41.CONFIG'
config.read(config_file_path)
[7]:
['../../../src/halodrops/helper/rd41.CONFIG']
[8]:
# Create a list of tuples with variable names and their corresponding sampling frequency
vars = [(var,int(config['sampling - frequencies'][var])) for var in config['sampling - frequencies'].keys()]
for var in vars:
print('---')
print(f'{profile.fullness(ds,var[0]):.02f}: Profile coverage of {var[0]}')
print(f'{profile.weighted_fullness(ds,var[0],var[1]):.02f}: Weighted profile coverage of {var[0]}')
---
0.48: Profile coverage of pres
0.96: Weighted profile coverage of pres
---
0.48: Profile coverage of tdry
0.97: Weighted profile coverage of tdry
---
0.44: Profile coverage of dp
0.89: Weighted profile coverage of dp
---
0.44: Profile coverage of rh
0.89: Weighted profile coverage of rh
---
0.93: Profile coverage of u_wind
0.93: Weighted profile coverage of u_wind
---
0.93: Profile coverage of v_wind
0.93: Weighted profile coverage of v_wind
---
0.93: Profile coverage of wspd
0.93: Weighted profile coverage of wspd
---
0.93: Profile coverage of wdir
0.93: Weighted profile coverage of wdir
---
0.44: Profile coverage of mr
0.89: Weighted profile coverage of mr
But it would be better to have these within the dataset itself. So, we do that with a function which returns these weighted profile coverage values as variables in the dataset.
[9]:
ds_with_weighted_fullness = profile.weighted_fullness_for_config_vars(ds,config_file_path=config_file_path)
[10]:
# Check out the last 9 variables that we added to the dataset
ds_with_weighted_fullness
[10]:
<xarray.Dataset>
Dimensions: (time: 2727, obs: 1)
Coordinates:
* time (time) datetime64[ns] 2022-04-01T12:32:26.78997...
lat (time) float32 ...
lon (time) float32 ...
gpsalt (time) float32 ...
Dimensions without coordinates: obs
Data variables: (12/36)
trajectory |S1 ...
launch_time datetime64[ns] ...
pres (time) float32 1.023e+03 nan 1.022e+03 ... nan nan
tdry (time) float32 -23.02 nan -23.06 ... nan nan nan
dp (time) float32 -25.19 nan -25.21 ... nan nan nan
rh (time) float32 82.75 nan 82.87 nan ... nan nan nan
... ...
rh_weighted_fullness float64 0.8896
u_wind_weighted_fullness float64 0.9318
v_wind_weighted_fullness float64 0.9318
wspd_weighted_fullness float64 0.9318
wdir_weighted_fullness float64 0.9318
mr_weighted_fullness float64 0.8896
Attributes: (12/91)
Conventions: CF-1.6
RepoRevision: V3.4.4
RepoLastChangedDate: Fri May 1 14:20:30 2020 -0600
RepoId: 2c0e825cc03af2932104c9a128eae846c428bc6c
RepoBranch: master
featureType: trajectory
... ...
WindQCDev: 999
WindQCWL: 30
WindSats: 4
WindSmoothWL: 10
WindVVPresWL: 5
WindVVdelta: 2.5