{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. Calculate variance of Stokes and anti-Stokes measurements"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this notebook is to estimate the variance of the noise of the Stokes measurement. The measured Stokes and anti-Stokes signals contain noise that is distributed approximately normal. We need to estimate the variance of the noise to:\n",
    "- Perform a weighted calibration \n",
    "- Construct confidence intervals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:09:19.731406Z",
     "iopub.status.busy": "2022-04-06T08:09:19.729379Z",
     "iopub.status.idle": "2022-04-06T08:09:22.072014Z",
     "shell.execute_reply": "2022-04-06T08:09:22.071324Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import warnings\n",
    "\n",
    "warnings.simplefilter(\"ignore\")  # Hide warnings to avoid clutter in the notebook\n",
    "\n",
    "from dtscalibration import read_silixa_files\n",
    "\n",
    "# The following line introduces the .dts accessor for xarray datasets\n",
    "import dtscalibration  # noqa: E401  # noqa: E401\n",
    "from dtscalibration.variance_stokes import variance_stokes_constant\n",
    "from matplotlib import pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:09:22.074444Z",
     "iopub.status.busy": "2022-04-06T08:09:22.074288Z",
     "iopub.status.idle": "2022-04-06T08:09:22.384442Z",
     "shell.execute_reply": "2022-04-06T08:09:22.383910Z"
    }
   },
   "outputs": [],
   "source": [
    "filepath = os.path.join(\"..\", \"..\", \"tests\", \"data\", \"double_ended2\")\n",
    "\n",
    "ds = read_silixa_files(directory=filepath, timezone_netcdf=\"UTC\", file_ext=\"*.xml\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we define the sections as we learned from the previous notebook. Sections are required to calculate the variance in the Stokes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:09:22.387088Z",
     "iopub.status.busy": "2022-04-06T08:09:22.386906Z",
     "iopub.status.idle": "2022-04-06T08:09:22.407332Z",
     "shell.execute_reply": "2022-04-06T08:09:22.406738Z"
    }
   },
   "outputs": [],
   "source": [
    "sections = {\n",
    "    \"probe1Temperature\": [slice(7.5, 17.0), slice(70.0, 80.0)],  # cold bath\n",
    "    \"probe2Temperature\": [slice(24.0, 34.0), slice(85.0, 95.0)],  # warm bath\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The variance in the Stokes signal will vary along the length of the fiber. There are multiple ways to approach this, each has its own pros and cons. **It is important to consider which model you use for your setup, as this will impact the calibration weights and predicted uncertainty.**\n",
    "\n",
    "- In small setups with small variations in Stokes intensity, `variance_stokes_constant` can be used. This function determines a single (constant) value for the variance. This method is not recommended for larger setups (e.g., >300 m) due to the signal strength dependency of the variance.\n",
    "\n",
    "\n",
    "- For larger setups `variance_stokes_linear` should be used. This function assumes a linear relationship between the Stokes signal strength and variance. Tests on Silixa and Sensornet devices indicate this relationship is linear, and (approximately) goes through the origin; i.e. at 0 Stokes intensity, the signal variance is very close to 0.\n",
    "\n",
    "\n",
    "- `variance_stokes_exponential` can be used for small setups with very few time steps. Too many degrees of freedom results in an under estimation of the noise variance. Almost never the case, but use when calibrating e.g. a single time step."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As the setup we are using is only 100 m in length, we can use `ds.variance_stokes_constant`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:09:22.410067Z",
     "iopub.status.busy": "2022-04-06T08:09:22.409845Z",
     "iopub.status.idle": "2022-04-06T08:09:22.633926Z",
     "shell.execute_reply": "2022-04-06T08:09:22.633317Z"
    }
   },
   "outputs": [],
   "source": [
    "I_var, residuals = variance_stokes_constant(\n",
    "    ds.dts.st, sections, ds.dts.acquisitiontime_fw, reshape_residuals=True\n",
    ")\n",
    "print(\n",
    "    \"The variance of the Stokes signal along the reference sections \"\n",
    "    \"is approximately {:.2f} on a {:.1f} sec acquisition time\".format(\n",
    "        I_var, ds.userAcquisitionTimeFW.data[0]\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:09:22.636314Z",
     "iopub.status.busy": "2022-04-06T08:09:22.636094Z",
     "iopub.status.idle": "2022-04-06T08:09:23.503012Z",
     "shell.execute_reply": "2022-04-06T08:09:23.502007Z"
    }
   },
   "outputs": [],
   "source": [
    "from dtscalibration import plot\n",
    "\n",
    "fig_handle = plot.plot_residuals_reference_sections(\n",
    "    residuals,\n",
    "    sections,\n",
    "    title=\"Distribution of the noise in the Stokes signal\",\n",
    "    plot_avg_std=I_var**0.5,\n",
    "    plot_names=True,\n",
    "    robust=True,\n",
    "    units=\"\",\n",
    "    method=\"single\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The residuals should be normally distributed and independent from previous time steps and other points along the cable. If you observe patterns in the residuals plot (above), it might be caused by:\n",
    "- The temperature in the calibration bath is not uniform\n",
    "- Attenuation caused by coils/sharp bends in cable\n",
    "- Attenuation caused by a splice"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T08:09:23.505848Z",
     "iopub.status.busy": "2022-04-06T08:09:23.505609Z",
     "iopub.status.idle": "2022-04-06T08:09:23.821421Z",
     "shell.execute_reply": "2022-04-06T08:09:23.820775Z"
    }
   },
   "outputs": [],
   "source": [
    "import scipy\n",
    "import numpy as np\n",
    "\n",
    "sigma = residuals.std()\n",
    "mean = residuals.mean()\n",
    "x = np.linspace(mean - 3 * sigma, mean + 3 * sigma, 100)\n",
    "approximated_normal_fit = scipy.stats.norm.pdf(x, mean, sigma)\n",
    "residuals.plot.hist(bins=50, figsize=(12, 8), density=True)\n",
    "plt.plot(x, approximated_normal_fit)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can follow the same steps to calculate the variance from the noise in the anti-Stokes measurments by setting `st_label='ast` and redoing the steps."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}