16. Confidence intervals of average temperatures

See Notebook 8 for a description of the calibration procedure. This notebook is about the confidence intervals estimation using measurements from a double ended setup.

Calibration procedure

[1]:
import os

from dtscalibration import read_silixa_files

# The following line introduces the .dts accessor for xarray datasets
import dtscalibration  # noqa: E401  # noqa: E401
from dtscalibration.variance_stokes import variance_stokes_constant
import matplotlib.pyplot as plt
[2]:
filepath = os.path.join("..", "..", "tests", "data", "double_ended2")

ds_ = read_silixa_files(directory=filepath, timezone_netcdf="UTC", file_ext="*.xml")

ds = ds_.sel(x=slice(0, 100))  # only calibrate parts of the fiber
sections = {
    "probe1Temperature": [slice(7.5, 17.0), slice(70.0, 80.0)],  # cold bath
    "probe2Temperature": [slice(24.0, 34.0), slice(85.0, 95.0)],  # warm bath
}
6 files were found, each representing a single timestep
6 recorded vars were found: LAF, ST, AST, REV-ST, REV-AST, TMP
Recorded at 1693 points along the cable
The measurement is double ended
Reading the data from disk
[3]:
st_var, resid = variance_stokes_constant(
    ds.dts.st, sections, ds.dts.acquisitiontime_fw, reshape_residuals=True
)
ast_var, _ = variance_stokes_constant(
    ds.dts.ast, sections, ds.dts.acquisitiontime_fw, reshape_residuals=False
)
rst_var, _ = variance_stokes_constant(
    ds.dts.rst, sections, ds.dts.acquisitiontime_bw, reshape_residuals=False
)
rast_var, _ = variance_stokes_constant(
    ds.dts.rast, sections, ds.dts.acquisitiontime_bw, reshape_residuals=False
)
[4]:
out = ds.dts.calibrate_double_ended(
    sections=sections,
    st_var=st_var,
    ast_var=ast_var,
    rst_var=rst_var,
    rast_var=rast_var,
)

Confidence intervals of averages

Introduction confidence intervals

The confidence intervals consist of two sources of uncertainty.

  1. Measurement noise in the measured Stokes and anti-Stokes signals. Expressed in a single variance value.

  2. Inherent to least squares procedures / overdetermined systems, the parameters are estimated with limited certainty and all parameters are correlated. Which is expressed in the covariance matrix.

Both sources of uncertainty are propagated to an uncertainty in the estimated temperature via Monte Carlo.

Confidence intervals are all computed with ds.conf_int_double_ended() and ds.conf_int_single_ended(). The confidence interval can be estimated if the calibration method is wls (so that the parameter uncertainties are estimated), st_var, ast_var, rst_var, rast_var are correctly estimated, and confidence intervals are passed to conf_ints. As weigths are correctly passed to the least squares procedure, the covariance matrix can be used as an estimator for the uncertainty in the parameters. This matrix holds the covariances between all the parameters. A large parameter set is generated from this matrix as part of the Monte Carlo routine, assuming the parameter space is normally distributed with their mean at the best estimate of the least squares procedure.

The large parameter set is used to calculate a large set of temperatures. By using percentiles or quantile the 95% confidence interval of the calibrated temperature between 2.5% and 97.5% are calculated.

Four types of averaging schemes are implemented: 1. Averaging over time while the temperature varies over time and along the fiber 2. Averaging over time while assuming the temperature remains constant over time but varies along the fiber 3. Averaging along the fiber while the temperature varies along the cable and over time 4. Averaging along the fiber while assuming the temperature is same along the fiber but varies over time

These functions only work with the same size DataStore as that was calibrated. If you would like to average only a selection use the keyword arguments ci_avg_time_sel, ci_avg_time_isel, ci_avg_x_sel, ci_avg_x_isel.

1. Averaging over time while the temperature varies over time and along the fiber

So that you can state: - ‘We can say with 95% confidence that the temperature remained between this line and this line during the entire measurement period’. - The average temperature during the measurement period was ..

Using the default store_.. values the following DataArrays are added to the DataStore:

tmpf_avg1              The average forward temperature
tmpf_mc_avg1_var       The estimated variance of the average forward temperature
tmpf_mc_avg1           The confidence intervals of the average forward temperature

tmpb_avg1              The average backward temperature
tmpb_mc_avg1_var       The estimated variance of the average backward temperature
tmpb_mc_avg1           The confidence intervals of the average forward temperature

tmpw_avg1              The average forward-backward-averaged temperature
tmpw_avg1_var          The estimated variance of the average forward-backward-averaged temperature
tmpw_mc_avg1           The confidence intervals of the average forward-backward-averaged temperature
[5]:
out_avg = ds.dts.average_monte_carlo_double_ended(
    result=out,
    st_var=st_var,
    ast_var=ast_var,
    rst_var=rst_var,
    rast_var=rast_var,
    conf_ints=[2.5, 97.5],
    mc_sample_size=500,  # <- choose a much larger sample size
    ci_avg_time_flag1=True,
    ci_avg_time_flag2=False,
    ci_avg_time_isel=[0, 1, 2, 3, 4, 5],
    ci_avg_time_sel=None,
)
out_avg.tmpw_mc_avg1.plot(hue="CI", linewidth=0.8)
Removed from results: time_avg
Removed from results: tmpf_mc_avgsec_var
Removed from results: tmpb_mc_avgsec_var
Removed from results: tmpw_avgsec
Removed from results: tmpw_mc_avgsec_var
[5]:
[<matplotlib.lines.Line2D at 0x7f897c03f820>,
 <matplotlib.lines.Line2D at 0x7f897c03f910>]
../_images/notebooks_16Averaging_temperatures_10_2.png

2. Averaging over time while assuming the temperature remains constant over time but varies along the fiber

So that you can state: - ‘I want to estimate a background temperature with confidence intervals. I hereby assume the temperature does not change over time and average all measurements to get a better estimate of the background temperature.’

Using the default store_.. values the following DataArrays are added to the DataStore:

tmpf_avg2              The average forward temperature
tmpf_mc_avg2_var       The estimated variance of the average forward temperature
tmpf_mc_avg2           The confidence intervals of the average forward temperature

tmpb_avg2              The average backward temperature
tmpb_mc_avg2_var       The estimated variance of the average backward temperature
tmpb_mc_avg2           The confidence intervals of the average forward temperature

tmpw_avg2              The average forward-backward-averaged temperature
tmpw_avg2_var          The estimated variance of the average forward-backward-averaged temperature
tmpw_mc_avg2           The confidence intervals of the average forward-backward-averaged temperature

Note that this average has much less uncertainty that averaging option 1. We can specify specific times with ci_avg_time_isel.

[6]:
out_avg = ds.dts.average_monte_carlo_double_ended(
    result=out,
    st_var=st_var,
    ast_var=ast_var,
    rst_var=rst_var,
    rast_var=rast_var,
    conf_ints=[2.5, 97.5],
    mc_sample_size=500,  # <- choose a much larger sample size
    ci_avg_time_flag1=False,
    ci_avg_time_flag2=True,
    ci_avg_time_isel=[0, 1, 2, 3, 4, 5],
    ci_avg_time_sel=None,
)
out_avg.tmpw_mc_avg2.plot(hue="CI", linewidth=0.8)
Removed from results: time_avg
Removed from results: tmpf_mc_avgsec_var
Removed from results: tmpb_mc_avgsec_var
Removed from results: tmpw_avgsec
Removed from results: tmpw_mc_avgsec_var
[6]:
[<matplotlib.lines.Line2D at 0x7f8975e7e6a0>,
 <matplotlib.lines.Line2D at 0x7f897c919c70>]
../_images/notebooks_16Averaging_temperatures_13_2.png

3. Averaging along the fiber while the temperature varies along the cable and over time

So that you can state: - ‘The temperature of the fiber remained between these ci bounds at time 2, and at time 3 the temperature of the fiber remained between these ci bounds’.

Using the default store_.. values the following DataArrays are added to the DataStore:

tmpf_avgx1              The average forward temperature
tmpf_mc_avgx1_var       The estimated variance of the average forward temperature
tmpf_mc_avgx1           The confidence intervals of the average forward temperature

tmpb_avgx1              The average backward temperature
tmpb_mc_avgx1_var       The estimated variance of the average backward temperature
tmpb_mc_avgx1           The confidence intervals of the average forward temperature

tmpw_avgx1              The average forward-backward-averaged temperature
tmpw_avgx1_var          The estimated variance of the average forward-backward-averaged temperature
tmpw_mc_avgx1           The confidence intervals of the average forward-backward-averaged temperature

Note that this function returns a single average per time step. Use the keyword arguments ci_avg_x_sel, ci_avg_x_isel to specify specific fiber sections.

[7]:
out_avg = ds.dts.average_monte_carlo_double_ended(
    result=out,
    st_var=st_var,
    ast_var=ast_var,
    rst_var=rst_var,
    rast_var=rast_var,
    conf_ints=[2.5, 97.5],
    mc_sample_size=500,  # <- choose a much larger sample size
    ci_avg_x_flag1=True,
    ci_avg_x_flag2=False,
    ci_avg_x_sel=slice(7.5, 17.0),
    ci_avg_x_isel=None,
)
out_avg.tmpw_mc_avgx1.plot(hue="CI", linewidth=0.8)
Removed from results: x_avg
Removed from results: mc
Removed from results: tmpf_mc_avgsec_var
Removed from results: tmpb_mc_avgsec_var
Removed from results: tmpw_avgsec
Removed from results: tmpw_mc_avgsec_var
[7]:
[<matplotlib.lines.Line2D at 0x7f8975ec1520>,
 <matplotlib.lines.Line2D at 0x7f8975e7ecd0>]
../_images/notebooks_16Averaging_temperatures_16_2.png

4. Averaging along the fiber while assuming the temperature is same along the fiber but varies over time

So that you can state: - ‘I have put a lot of fiber in water, and I know that the temperature variation in the water is much smaller than along other parts of the fiber. And I would like to average the measurements from multiple locations to improve the estimated temperature of the water’.

Using the default store_.. values the following DataArrays are added to the DataStore:

tmpf_avgx2              The average forward temperature
tmpf_mc_avgx2_var       The estimated variance of the average forward temperature
tmpf_mc_avgx2           The confidence intervals of the average forward temperature

tmpb_avgx2              The average backward temperature
tmpb_mc_avgx2_var       The estimated variance of the average backward temperature
tmpb_mc_avgx2           The confidence intervals of the average forward temperature

tmpw_avgx2              The average forward-backward-averaged temperature
tmpw_avgx2_var          The estimated variance of the average forward-backward-averaged temperature
tmpw_mc_avgx2           The confidence intervals of the average forward-backward-averaged temperature

Select the part of the fiber that is in the water with ci_avg_x_sel.

[8]:
out_avg = ds.dts.average_monte_carlo_double_ended(
    result=out,
    st_var=st_var,
    ast_var=ast_var,
    rst_var=rst_var,
    rast_var=rast_var,
    conf_ints=[2.5, 97.5],
    mc_sample_size=500,  # <- choose a much larger sample size
    ci_avg_x_flag1=False,
    ci_avg_x_flag2=True,
    ci_avg_x_sel=slice(7.5, 17.0),
    ci_avg_x_isel=None,
)
out_avg.tmpw_mc_avgx2.plot(hue="CI", linewidth=0.8)
Removed from results: x_avg
Removed from results: tmpf_mc_avgsec_var
Removed from results: tmpb_mc_avgsec_var
Removed from results: tmpw_avgsec
Removed from results: tmpw_mc_avgsec_var
[8]:
[<matplotlib.lines.Line2D at 0x7f8975d7fac0>,
 <matplotlib.lines.Line2D at 0x7f8975d76e50>]
../_images/notebooks_16Averaging_temperatures_19_2.png