{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 16. Confidence intervals of average temperatures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See Notebook 8 for a description of the calibration procedure. This notebook is about the confidence intervals estimation using measurements from a double ended setup."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calibration procedure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:11:56.113565Z",
     "iopub.status.busy": "2022-04-06T08:11:56.113223Z",
     "iopub.status.idle": "2022-04-06T08:11:57.556537Z",
     "shell.execute_reply": "2022-04-06T08:11:57.555980Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from dtscalibration import read_silixa_files\n",
    "\n",
    "# The following line introduces the .dts accessor for xarray datasets\n",
    "import dtscalibration  # noqa: E401  # noqa: E401\n",
    "from dtscalibration.variance_stokes import variance_stokes_constant\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:11:57.559220Z",
     "iopub.status.busy": "2022-04-06T08:11:57.558858Z",
     "iopub.status.idle": "2022-04-06T08:11:57.856068Z",
     "shell.execute_reply": "2022-04-06T08:11:57.855409Z"
    }
   },
   "outputs": [],
   "source": [
    "filepath = os.path.join(\"..\", \"..\", \"tests\", \"data\", \"double_ended2\")\n",
    "\n",
    "ds_ = read_silixa_files(directory=filepath, timezone_netcdf=\"UTC\", file_ext=\"*.xml\")\n",
    "\n",
    "ds = ds_.sel(x=slice(0, 100))  # only calibrate parts of the fiber\n",
    "sections = {\n",
    "    \"probe1Temperature\": [slice(7.5, 17.0), slice(70.0, 80.0)],  # cold bath\n",
    "    \"probe2Temperature\": [slice(24.0, 34.0), slice(85.0, 95.0)],  # warm bath\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:11:57.858849Z",
     "iopub.status.busy": "2022-04-06T08:11:57.858350Z",
     "iopub.status.idle": "2022-04-06T08:11:58.597727Z",
     "shell.execute_reply": "2022-04-06T08:11:58.597067Z"
    }
   },
   "outputs": [],
   "source": [
    "st_var, resid = variance_stokes_constant(\n",
    "    ds.dts.st, sections, ds.dts.acquisitiontime_fw, reshape_residuals=True\n",
    ")\n",
    "ast_var, _ = variance_stokes_constant(\n",
    "    ds.dts.ast, sections, ds.dts.acquisitiontime_fw, reshape_residuals=False\n",
    ")\n",
    "rst_var, _ = variance_stokes_constant(\n",
    "    ds.dts.rst, sections, ds.dts.acquisitiontime_bw, reshape_residuals=False\n",
    ")\n",
    "rast_var, _ = variance_stokes_constant(\n",
    "    ds.dts.rast, sections, ds.dts.acquisitiontime_bw, reshape_residuals=False\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:11:58.600213Z",
     "iopub.status.busy": "2022-04-06T08:11:58.600035Z",
     "iopub.status.idle": "2022-04-06T08:11:59.034418Z",
     "shell.execute_reply": "2022-04-06T08:11:59.033850Z"
    }
   },
   "outputs": [],
   "source": [
    "out = ds.dts.calibrate_double_ended(\n",
    "    sections=sections,\n",
    "    st_var=st_var,\n",
    "    ast_var=ast_var,\n",
    "    rst_var=rst_var,\n",
    "    rast_var=rast_var,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Confidence intervals of averages\n",
    "### Introduction confidence intervals\n",
    "The confidence intervals consist of two sources of uncertainty.\n",
    "\n",
    "1. Measurement noise in the measured Stokes and anti-Stokes signals. Expressed in a single variance value.\n",
    "2. Inherent to least squares procedures / overdetermined systems, the parameters are estimated with limited certainty and all parameters are correlated. Which is expressed in the covariance matrix.\n",
    "\n",
    "Both sources of uncertainty are propagated to an uncertainty in the estimated temperature via Monte Carlo.\n",
    "\n",
    "Confidence intervals are all computed with `ds.conf_int_double_ended()` and `ds.conf_int_single_ended()`.\n",
    "The confidence interval can be estimated if the calibration method is `wls` (so that the parameter uncertainties are estimated), `st_var`, `ast_var`, `rst_var`, `rast_var` are correctly estimated, and confidence intervals are passed to `conf_ints`. As weigths are correctly passed to the least squares procedure, the covariance matrix can be used as an estimator for the uncertainty in the parameters. This matrix holds the covariances between all the parameters. A large parameter set is generated from this matrix as part of the Monte Carlo routine, assuming the parameter space is normally distributed with their mean at the best estimate of the least squares procedure.\n",
    "\n",
    "The large parameter set is used to calculate a large set of temperatures. By using `percentiles` or `quantile` the 95% confidence interval of the calibrated temperature between 2.5% and 97.5% are calculated.\n",
    "\n",
    "Four types of averaging schemes are implemented:\n",
    "1. Averaging over time while the temperature varies over time and along the fiber\n",
    "2. Averaging over time while assuming the temperature remains constant over time but varies along the fiber\n",
    "3. Averaging along the fiber while the temperature varies along the cable and over time\n",
    "4. Averaging along the fiber while assuming the temperature is same along the fiber but varies over time\n",
    "\n",
    "These functions only work with the same size DataStore as that was calibrated. If you would like to average only a selection use the keyword arguments `ci_avg_time_sel`, `ci_avg_time_isel`, `ci_avg_x_sel`, `ci_avg_x_isel`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Averaging over time while the temperature varies over time and along the fiber"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So that you can state: \n",
    "- 'We can say with 95% confidence that the temperature remained between this line and this line during the entire measurement period'. \n",
    "- The average temperature during the measurement period was ..\n",
    "\n",
    "Using the default `store_..` values the following DataArrays are added to the DataStore:\n",
    "```\n",
    "tmpf_avg1              The average forward temperature\n",
    "tmpf_mc_avg1_var       The estimated variance of the average forward temperature\n",
    "tmpf_mc_avg1           The confidence intervals of the average forward temperature\n",
    "\n",
    "tmpb_avg1              The average backward temperature\n",
    "tmpb_mc_avg1_var       The estimated variance of the average backward temperature\n",
    "tmpb_mc_avg1           The confidence intervals of the average forward temperature\n",
    "\n",
    "tmpw_avg1              The average forward-backward-averaged temperature\n",
    "tmpw_avg1_var          The estimated variance of the average forward-backward-averaged temperature\n",
    "tmpw_mc_avg1           The confidence intervals of the average forward-backward-averaged temperature\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:11:59.037090Z",
     "iopub.status.busy": "2022-04-06T08:11:59.036867Z",
     "iopub.status.idle": "2022-04-06T08:11:59.978183Z",
     "shell.execute_reply": "2022-04-06T08:11:59.977531Z"
    }
   },
   "outputs": [],
   "source": [
    "out_avg = ds.dts.average_monte_carlo_double_ended(\n",
    "    result=out,\n",
    "    st_var=st_var,\n",
    "    ast_var=ast_var,\n",
    "    rst_var=rst_var,\n",
    "    rast_var=rast_var,\n",
    "    conf_ints=[2.5, 97.5],\n",
    "    mc_sample_size=500,  # <- choose a much larger sample size\n",
    "    ci_avg_time_flag1=True,\n",
    "    ci_avg_time_flag2=False,\n",
    "    ci_avg_time_isel=[0, 1, 2, 3, 4, 5],\n",
    "    ci_avg_time_sel=None,\n",
    ")\n",
    "out_avg.tmpw_mc_avg1.plot(hue=\"CI\", linewidth=0.8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Averaging over time while assuming the temperature remains constant over time but varies along the fiber"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So that you can state: \n",
    "- 'I want to estimate a background temperature with confidence intervals. I hereby assume the temperature does not change over time and average all measurements to get a better estimate of the background temperature.'\n",
    "\n",
    "Using the default `store_..` values the following DataArrays are added to the DataStore:\n",
    "```\n",
    "tmpf_avg2              The average forward temperature\n",
    "tmpf_mc_avg2_var       The estimated variance of the average forward temperature\n",
    "tmpf_mc_avg2           The confidence intervals of the average forward temperature\n",
    "\n",
    "tmpb_avg2              The average backward temperature\n",
    "tmpb_mc_avg2_var       The estimated variance of the average backward temperature\n",
    "tmpb_mc_avg2           The confidence intervals of the average forward temperature\n",
    "\n",
    "tmpw_avg2              The average forward-backward-averaged temperature\n",
    "tmpw_avg2_var          The estimated variance of the average forward-backward-averaged temperature\n",
    "tmpw_mc_avg2           The confidence intervals of the average forward-backward-averaged temperature\n",
    "```\n",
    "\n",
    "Note that this average has much less uncertainty that averaging option 1. We can specify specific times with `ci_avg_time_isel`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:11:59.980584Z",
     "iopub.status.busy": "2022-04-06T08:11:59.980388Z",
     "iopub.status.idle": "2022-04-06T08:12:00.924816Z",
     "shell.execute_reply": "2022-04-06T08:12:00.924221Z"
    }
   },
   "outputs": [],
   "source": [
    "out_avg = ds.dts.average_monte_carlo_double_ended(\n",
    "    result=out,\n",
    "    st_var=st_var,\n",
    "    ast_var=ast_var,\n",
    "    rst_var=rst_var,\n",
    "    rast_var=rast_var,\n",
    "    conf_ints=[2.5, 97.5],\n",
    "    mc_sample_size=500,  # <- choose a much larger sample size\n",
    "    ci_avg_time_flag1=False,\n",
    "    ci_avg_time_flag2=True,\n",
    "    ci_avg_time_isel=[0, 1, 2, 3, 4, 5],\n",
    "    ci_avg_time_sel=None,\n",
    ")\n",
    "out_avg.tmpw_mc_avg2.plot(hue=\"CI\", linewidth=0.8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Averaging along the fiber while the temperature varies along the cable and over time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So that you can state: \n",
    "- 'The temperature of the fiber remained between these ci bounds at time 2, and at time 3 the temperature of the fiber remained between these ci bounds'.\n",
    "\n",
    "Using the default `store_..` values the following DataArrays are added to the DataStore:\n",
    "```\n",
    "tmpf_avgx1              The average forward temperature\n",
    "tmpf_mc_avgx1_var       The estimated variance of the average forward temperature\n",
    "tmpf_mc_avgx1           The confidence intervals of the average forward temperature\n",
    "\n",
    "tmpb_avgx1              The average backward temperature\n",
    "tmpb_mc_avgx1_var       The estimated variance of the average backward temperature\n",
    "tmpb_mc_avgx1           The confidence intervals of the average forward temperature\n",
    "\n",
    "tmpw_avgx1              The average forward-backward-averaged temperature\n",
    "tmpw_avgx1_var          The estimated variance of the average forward-backward-averaged temperature\n",
    "tmpw_mc_avgx1           The confidence intervals of the average forward-backward-averaged temperature\n",
    "```\n",
    "\n",
    "Note that this function returns a single average per time step. Use the keyword arguments `ci_avg_x_sel`, `ci_avg_x_isel` to specify specific fiber sections."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:12:00.927519Z",
     "iopub.status.busy": "2022-04-06T08:12:00.927299Z",
     "iopub.status.idle": "2022-04-06T08:12:01.908648Z",
     "shell.execute_reply": "2022-04-06T08:12:01.908039Z"
    }
   },
   "outputs": [],
   "source": [
    "out_avg = ds.dts.average_monte_carlo_double_ended(\n",
    "    result=out,\n",
    "    st_var=st_var,\n",
    "    ast_var=ast_var,\n",
    "    rst_var=rst_var,\n",
    "    rast_var=rast_var,\n",
    "    conf_ints=[2.5, 97.5],\n",
    "    mc_sample_size=500,  # <- choose a much larger sample size\n",
    "    ci_avg_x_flag1=True,\n",
    "    ci_avg_x_flag2=False,\n",
    "    ci_avg_x_sel=slice(7.5, 17.0),\n",
    "    ci_avg_x_isel=None,\n",
    ")\n",
    "out_avg.tmpw_mc_avgx1.plot(hue=\"CI\", linewidth=0.8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Averaging along the fiber while assuming the temperature is same along the fiber but varies over time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So that you can state: \n",
    "- 'I have put a lot of fiber in water, and I know that the temperature variation in the water is much smaller than along other parts of the fiber. And I would like to average the measurements from multiple locations to improve the estimated temperature of the water'.\n",
    "\n",
    "Using the default `store_..` values the following DataArrays are added to the DataStore:\n",
    "```\n",
    "tmpf_avgx2              The average forward temperature\n",
    "tmpf_mc_avgx2_var       The estimated variance of the average forward temperature\n",
    "tmpf_mc_avgx2           The confidence intervals of the average forward temperature\n",
    "\n",
    "tmpb_avgx2              The average backward temperature\n",
    "tmpb_mc_avgx2_var       The estimated variance of the average backward temperature\n",
    "tmpb_mc_avgx2           The confidence intervals of the average forward temperature\n",
    "\n",
    "tmpw_avgx2              The average forward-backward-averaged temperature\n",
    "tmpw_avgx2_var          The estimated variance of the average forward-backward-averaged temperature\n",
    "tmpw_mc_avgx2           The confidence intervals of the average forward-backward-averaged temperature\n",
    "```\n",
    "\n",
    "Select the part of the fiber that is in the water with `ci_avg_x_sel`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:12:01.912067Z",
     "iopub.status.busy": "2022-04-06T08:12:01.911684Z",
     "iopub.status.idle": "2022-04-06T08:12:03.015479Z",
     "shell.execute_reply": "2022-04-06T08:12:03.014976Z"
    }
   },
   "outputs": [],
   "source": [
    "out_avg = ds.dts.average_monte_carlo_double_ended(\n",
    "    result=out,\n",
    "    st_var=st_var,\n",
    "    ast_var=ast_var,\n",
    "    rst_var=rst_var,\n",
    "    rast_var=rast_var,\n",
    "    conf_ints=[2.5, 97.5],\n",
    "    mc_sample_size=500,  # <- choose a much larger sample size\n",
    "    ci_avg_x_flag1=False,\n",
    "    ci_avg_x_flag2=True,\n",
    "    ci_avg_x_sel=slice(7.5, 17.0),\n",
    "    ci_avg_x_isel=None,\n",
    ")\n",
    "out_avg.tmpw_mc_avgx2.plot(hue=\"CI\", linewidth=0.8)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}