{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 8. Calibration of double-ended measurements"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A double-ended calibration is performed where the unknown parameters are estimated using fiber sections that have a reference temperature. The parameters are estimated with a weighted least squares optimization using Stokes and anti-Stokes measurements from all timesteps. Thus Stokes and anti-Stokes measurements with a large signal to noise ratio (close to the DTS device) contribute more towards estimating the optimal parameter set. This approach requires one extra step, estimating the variance of the noise in the Stokes measurements, but it improves the temperature estimate and allows for the estimation of uncertainty in the temperature. So well worth it!\n",
    "\n",
    "Double-ended calibration requires a few steps. Please have a look at [1] for more information:\n",
    "- Read the raw data files loaded from your DTS machine\n",
    "- Define the reference fiber sections that have a known temperature\n",
    "- Align measurements of the forward and backward channels\n",
    "- Estimate the variance of the noise in the Stokes and anti-Stokes measurements\n",
    "- Perform the parameter estimation and compute the temperature along the entire fiber\n",
    "- Plot the uncertainty of the estimated temperature\n",
    "\n",
    "[1]: des Tombe, B., Schilperoort, B., & Bakker, M. (2020). Estimation of Temperature and Associated Uncertainty from Fiber-Optic Raman-Spectrum Distributed Temperature Sensing. Sensors, 20(8), 2235. https://doi.org/10.3390/s20082235"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:10:13.277128Z",
     "iopub.status.busy": "2022-04-06T08:10:13.276102Z",
     "iopub.status.idle": "2022-04-06T08:10:14.748590Z",
     "shell.execute_reply": "2022-04-06T08:10:14.748034Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import warnings\n",
    "\n",
    "from dtscalibration import read_silixa_files\n",
    "from dtscalibration.dts_accessor_utils import (\n",
    "    suggest_cable_shift_double_ended,\n",
    "    shift_double_ended,\n",
    ")\n",
    "\n",
    "# The following line introduces the .dts accessor for xarray datasets\n",
    "import dtscalibration  # noqa: E401  # noqa: E401\n",
    "from dtscalibration.variance_stokes import variance_stokes_constant\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "warnings.simplefilter(\"ignore\")  # Hide warnings to avoid clutter in the notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read the raw data files loaded from your DTS machine\n",
    "Use `read_silixa_files` for reading files from a Silixa device. The following functions are available for reading files from other devices: `read_sensortran_files`, `read_apsensing_files`, and `read_sensornet_files`. See Notebook 1. If your DTS device was configured such that your forward- and backward measurements are stored in seperate folders have a look at `11Merge_single_measurements_into_double` example notebook. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath = os.path.join(\"..\", \"..\", \"tests\", \"data\", \"double_ended2\")\n",
    "\n",
    "ds_notaligned = read_silixa_files(\n",
    "    directory=filepath, timezone_netcdf=\"UTC\", file_ext=\"*.xml\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Align measurements of the forward and backward channels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In double-ended measurements, it is important that the measurement points of the forward channel are at the same point as those of the backward channel. This requires setting the exact cable length prior to calibration. For double ended measurements it is important to use the correct length so that the forward channel and the backward channel are perfectly aligned. It matters a lot whether your fiber was 99 while in reality it was a 100 meters, as it increases the apparent spatial resolution of your measurements and increases parameter uncertainty and consequently increases the uncertainty of the estimated temperature.\n",
    "\n",
    "Select the part of the fiber that contains relevant measurements using the `sel` command. Slice the time dimension to select the measurements of the relevant times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:10:14.751185Z",
     "iopub.status.busy": "2022-04-06T08:10:14.750883Z",
     "iopub.status.idle": "2022-04-06T08:10:15.050453Z",
     "shell.execute_reply": "2022-04-06T08:10:15.049751Z"
    }
   },
   "outputs": [],
   "source": [
    "ds_notaligned = ds_notaligned.sel(x=slice(0, 100))  # only calibrate parts of the fiber"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The tool `suggest_cable_shift_double_ended` located in `dtscalibration.dts_accessor_utils` can be used estimate the required shift to perfectly align the forward and backward channels. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "suggested_shift = suggest_cable_shift_double_ended(\n",
    "    ds_notaligned,\n",
    "    np.arange(-5, 5),  # number of dx to shift\n",
    "    plot_result=True,\n",
    "    figsize=(12, 6),\n",
    ")\n",
    "print(suggested_shift)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This helper function suggests shift via two different methods. We apply the first suggested shift:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = shift_double_ended(ds_notaligned, suggested_shift[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define the reference fiber sections that have a known temperature\n",
    "As explained in Notebook 3. As explained in Notebook 3. DTS devices come with temperature probes to measure the temperature of the water baths. These measurements are stored in the data that was loaded in the previous step and are loaded automatically. In the case you would like to use an external temperature sensor, have a look at notebook `09Import_timeseries` to append those measurements to the `ds` before continuing with the calibration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sections = {\n",
    "    \"probe1Temperature\": [slice(7.5, 17.0), slice(70.0, 80.0)],  # cold bath\n",
    "    \"probe2Temperature\": [slice(24.0, 34.0), slice(85.0, 95.0)],  # warm bath\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimate the variance of the noise in the Stokes and anti-Stokes measurements\n",
    "First calculate the variance in the measured Stokes and anti-Stokes signals, in the forward and backward direction. See Notebook 4 for more information.\n",
    "\n",
    "The Stokes and anti-Stokes signals should follow a smooth decaying exponential. This function fits a decaying exponential to each reference section for each time step. The variance of the residuals between the measured Stokes and anti-Stokes signals and the fitted signals is used as an estimate of the variance in measured signals. This algorithm assumes that the temperature is the same for the entire section but may vary over time and differ per section.\n",
    "\n",
    "Note that the acquisition time of the backward channel is passed to the variance_stokes function for the later two funciton calls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:10:15.053145Z",
     "iopub.status.busy": "2022-04-06T08:10:15.052975Z",
     "iopub.status.idle": "2022-04-06T08:10:15.811537Z",
     "shell.execute_reply": "2022-04-06T08:10:15.810996Z"
    }
   },
   "outputs": [],
   "source": [
    "st_var, resid = variance_stokes_constant(\n",
    "    ds.dts.st, sections, ds.dts.acquisitiontime_fw, reshape_residuals=True\n",
    ")\n",
    "ast_var, _ = variance_stokes_constant(\n",
    "    ds.dts.ast, sections, ds.dts.acquisitiontime_fw, reshape_residuals=False\n",
    ")\n",
    "rst_var, _ = variance_stokes_constant(\n",
    "    ds.dts.rst, sections, ds.dts.acquisitiontime_bw, reshape_residuals=False\n",
    ")\n",
    "rast_var, _ = variance_stokes_constant(\n",
    "    ds.dts.rast, sections, ds.dts.acquisitiontime_bw, reshape_residuals=False\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following plot can be used to check if there are no spatial or temporal correlated residuals. If you see horizontal or vertical lines that means that you overestimate the st_var. Common reasons are that the temperature of that section is not uniform, e.g. that the reference sections were defined falsely (horizontal lines) or that the temperature of the water baths were not uniform (horizontal lines)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:10:15.813904Z",
     "iopub.status.busy": "2022-04-06T08:10:15.813688Z",
     "iopub.status.idle": "2022-04-06T08:10:16.081331Z",
     "shell.execute_reply": "2022-04-06T08:10:16.080696Z"
    }
   },
   "outputs": [],
   "source": [
    "resid.plot(figsize=(12, 4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform calibration and compute the temperature\n",
    "We calibrate the measurements with a single method call. Three temperatures are estimated for double-ended setups. The temperature using the Stokes and anti-Stokes meassurements of the forward channel, `tmpf`. The temperature of the backward channel, `tmpb`. And the weigthed average of the two, `tmpw`. The latter is the best estimate of the temperature along the fiber with the smallest uncertainty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:10:16.083711Z",
     "iopub.status.busy": "2022-04-06T08:10:16.083510Z",
     "iopub.status.idle": "2022-04-06T08:10:16.591924Z",
     "shell.execute_reply": "2022-04-06T08:10:16.591485Z"
    }
   },
   "outputs": [],
   "source": [
    "out = ds.dts.calibrate_double_ended(\n",
    "    sections=sections,\n",
    "    st_var=st_var,\n",
    "    ast_var=ast_var,\n",
    "    rst_var=rst_var,\n",
    "    rast_var=rast_var,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:10:16.594307Z",
     "iopub.status.busy": "2022-04-06T08:10:16.594086Z",
     "iopub.status.idle": "2022-04-06T08:10:16.849150Z",
     "shell.execute_reply": "2022-04-06T08:10:16.848523Z"
    }
   },
   "outputs": [],
   "source": [
    "out.tmpw.plot(figsize=(12, 4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Uncertainty of the calibrated temperature\n",
    "The uncertainty of the calibrated temperature can be computed in two manners:\n",
    "1. The variance of the calibrated temperature can be approximated using linear error propagation\n",
    "    - Very fast computation\n",
    "    - Only the variance is estimated\n",
    "    - Sufficiently accurate approximation for most cases\n",
    "2. The uncertainty distribution of the calibrated temperature can be approximated using Monte Carlo\n",
    "    - Slow computation\n",
    "    - Computes variances and confidence intervals\n",
    "    - Correctly propagates all uncertainties from the calibration\n",
    "    - Requires sufficiently large number of samples to be drawn to be correct, hence the slow computation.\n",
    "    - Only use this method: 1) To check the first method. 2) Specific interest in confidence intervals."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. The variance approximated using linear error propagation\n",
    "This first method works pretty good and is always computed when calling the `ds.calibration_double_ended()` function. First we plot variances for all times. Secondly, we plot the standard deviation (standard error) for the first timestep."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out.tmpw_var.plot(figsize=(12, 4))\n",
    "ds1 = out.isel(time=-1)  # take only the first timestep\n",
    "(ds1.tmpw_var**0.5).plot(figsize=(12, 4))\n",
    "plt.gca().set_ylabel(\"Standard error ($^\\circ$C)\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Uncertainty approximation using Monte Carlo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If `mc_sample_size` keyword argument is passed to the `ds.calibration_double_ended()` function, the uncertainty distribution of the estimated temperature is computed using a Monte Carlo approach. The variance of that distribution is accessed via `tmpw_mc_var`.\n",
    "\n",
    "The uncertainty comes from the noise in the (anti-) Stokes measurements and from the parameter estimation. Both sources are propagated via Monte Carlo sampling to an uncertainty distribution of the estimated temperature. As weigths are correctly passed to the least squares procedure via the `st_var` arguments, the covariance matrix can be used as an estimator for the uncertainty in the parameters. This matrix holds the covariances between all the parameters. A large parameter set is generated from this matrix as part of the Monte Carlo routine, assuming the parameter space is normally distributed with their mean at the best estimate of the least squares procedure.\n",
    "\n",
    "The large parameter set is used to calculate a large set of temperatures. By using `percentiles` or `quantile` the 95% confidence interval of the calibrated temperature between 2.5% and 97.5% are calculated. The confidence intervals differ per time step. If you would like to calculate confidence intervals temporal averages or averages of fiber sections see notebook 16."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out2 = ds.dts.monte_carlo_double_ended(\n",
    "    result=out,\n",
    "    st_var=st_var,\n",
    "    ast_var=ast_var,\n",
    "    rst_var=rst_var,\n",
    "    rast_var=rast_var,\n",
    "    conf_ints=[2.5, 97.5],\n",
    "    mc_sample_size=500,\n",
    ")  # < increase sample size for better approximation\n",
    "\n",
    "out2.tmpw_mc_var.plot(figsize=(12, 4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:10:17.115378Z",
     "iopub.status.busy": "2022-04-06T08:10:17.115193Z",
     "iopub.status.idle": "2022-04-06T08:10:18.229389Z",
     "shell.execute_reply": "2022-04-06T08:10:18.228867Z"
    }
   },
   "outputs": [],
   "source": [
    "out.isel(time=-1).tmpw.plot(linewidth=0.7, figsize=(12, 4))\n",
    "out2.isel(time=-1).tmpw_mc.isel(CI=0).plot(linewidth=0.7, label=\"CI: 2.5%\")\n",
    "out2.isel(time=-1).tmpw_mc.isel(CI=1).plot(linewidth=0.7, label=\"CI: 97.5%\")\n",
    "plt.legend(fontsize=\"small\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The DataArrays `tmpf_mc` and `tmpb_mc` and the dimension `CI` are added. `MC` stands for monte carlo and the `CI` dimension holds the confidence interval 'coordinates'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:10:18.231907Z",
     "iopub.status.busy": "2022-04-06T08:10:18.231718Z",
     "iopub.status.idle": "2022-04-06T08:10:19.045574Z",
     "shell.execute_reply": "2022-04-06T08:10:19.045043Z"
    }
   },
   "outputs": [],
   "source": [
    "(out2.isel(time=-1).tmpf_mc_var ** 0.5).plot(figsize=(12, 4))\n",
    "(out.isel(time=-1).tmpf_var ** 0.5).plot()\n",
    "(out2.isel(time=-1).tmpb_mc_var ** 0.5).plot()\n",
    "(out.isel(time=-1).tmpb_var ** 0.5).plot()\n",
    "(out.isel(time=-1).tmpw_var ** 0.5).plot()\n",
    "(out2.isel(time=-1).tmpw_mc_var ** 0.5).plot()\n",
    "plt.ylabel(\"$\\sigma$ ($^\\circ$C)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}