Source code for dtscalibration.variance_stokes

import dask.array as da
import numpy as np
import xarray as xr

from dtscalibration.calibration.section_utils import validate_no_overlapping_sections
from dtscalibration.calibration.section_utils import validate_sections_definition
from dtscalibration.dts_accessor_utils import ufunc_per_section_helper
from dtscalibration.variance_helpers import check_allclose_acquisitiontime
from dtscalibration.variance_helpers import variance_stokes_constant_helper
from dtscalibration.variance_helpers import variance_stokes_exponential_helper
from dtscalibration.variance_helpers import variance_stokes_linear_helper


[docs] def variance_stokes_constant(st, sections, acquisitiontime, reshape_residuals=True): """Approximate the variance of the noise in Stokes intensity measurements with one value, suitable for small setups. * `variance_stokes_constant()` for small setups with small variations in\ intensity. Variance of the Stokes measurements is assumed to be the same\ along the entire fiber. * `variance_stokes_exponential()` for small setups with very few time\ steps. Too many degrees of freedom results in an under estimation of the\ noise variance. Almost never the case, but use when calibrating pre time\ step. * `variance_stokes_linear()` for larger setups with more time steps.\ Assumes Poisson distributed noise with the following model:: st_var = a * ds.st + b where `a` and `b` are constants. Requires reference sections at beginning and end of the fiber, to have residuals at high and low intensity measurements. The Stokes and anti-Stokes intensities are measured with detectors, which inherently introduce noise to the measurements. Knowledge of the distribution of the measurement noise is needed for a calibration with weighted observations (Sections 5 and 6 of [1]_) and to project the associated uncertainty to the temperature confidence intervals (Section 7 of [1]_). Two sources dominate the noise in the Stokes and anti-Stokes intensity measurements (Hartog, 2017, p.125). Close to the laser, noise from the conversion of backscatter to electricity dominates the measurement noise. The detecting component, an avalanche photodiode, produces Poisson- distributed noise with a variance that increases linearly with the intensity. The Stokes and anti-Stokes intensities are commonly much larger than the standard deviation of the noise, so that the Poisson distribution can be approximated with a Normal distribution with a mean of zero and a variance that increases linearly with the intensity. At the far-end of the fiber, noise from the electrical circuit dominates the measurement noise. It produces Normal-distributed noise with a mean of zero and a variance that is independent of the intensity. Calculates the variance between the measurements and a best fit at each reference section. This fits a function to the nt * nx measurements with ns * nt + nx parameters, where nx are the total number of reference locations along all sections. The temperature is constant along the reference sections, so the expression of the Stokes power can be split in a time series per reference section and a constant per observation location. Idea from Discussion at page 127 in Richter, P. H. (1995). Estimating errors in least-squares fitting. The timeseries and the constant are, of course, highly correlated (Equations 20 and 21 in [1]_), but that is not relevant here as only the product is of interest. The residuals between the fitted product and the Stokes intensity measurements are attributed to the noise from the detector. The variance of the residuals is used as a proxy for the variance of the noise in the Stokes and anti-Stokes intensity measurements. A non-uniform temperature of the reference sections results in an over estimation of the noise variance estimate because all temperature variation is attributed to the noise. Parameters ---------- reshape_residuals st : DataArray sections : Dict[str, List[slice]] Returns: -------- I_var : float Variance of the residuals between measured and best fit resid : array_like Residuals between measured and best fit Notes: ------ * Because there are a large number of unknowns, spend time on\ calculating an initial estimate. Can be turned off by setting to False. * It is often not needed to use measurements from all time steps. If\ your variance estimate does not change when including measurements\ additional time steps, you have included enough measurements. References: ----------- .. [1] des Tombe, B., Schilperoort, B., & Bakker, M. (2020). Estimation of Temperature and Associated Uncertainty from Fiber-Optic Raman- Spectrum Distributed Temperature Sensing. Sensors, 20(8), 2235. https://doi.org/10.3390/s20082235 Examples: --------- - `Example notebook 4: Calculate variance Stokes intensity measurements\ <https://github.com/\ dtscalibration/python-dts-calibration/blob/main/examples/notebooks/\ 04Calculate_variance_Stokes.ipynb>`_ TODO: Account for varying acquisition times """ validate_sections_definition(sections=sections) validate_no_overlapping_sections(sections=sections) check_allclose_acquisitiontime(acquisitiontime=acquisitiontime) assert st.dims[0] == "x", "DataArray is transposed" # should maybe be per section. But then residuals # seem to be correlated between stretches. I don't know why.. BdT. data_dict = da.compute( ufunc_per_section_helper(sections=sections, dataarray=st, calc_per="stretch") )[0] var_I, resid = variance_stokes_constant_helper(data_dict) if not reshape_residuals: return var_I, resid else: ix_resid = ufunc_per_section_helper( sections=sections, x_coords=st.x, calc_per="all" ) resid_sorted = np.full(shape=st.shape, fill_value=np.nan) resid_sorted[ix_resid, :] = resid resid_da = xr.DataArray(data=resid_sorted, coords=st.coords) return var_I, resid_da
[docs] def variance_stokes_exponential( st, sections, acquisitiontime, use_statsmodels=False, suppress_info=True, reshape_residuals=True, ): """Approximate the variance of the noise in Stokes intensity measurements with one value, suitable for small setups with measurements from only a few times. * `variance_stokes_constant()` for small setups with small variations in\ intensity. Variance of the Stokes measurements is assumed to be the same\ along the entire fiber. * `variance_stokes_exponential()` for small setups with very few time\ steps. Too many degrees of freedom results in an under estimation of the\ noise variance. Almost never the case, but use when calibrating pre time\ step. * `variance_stokes_linear()` for larger setups with more time steps.\ Assumes Poisson distributed noise with the following model:: st_var = a * ds.st + b where `a` and `b` are constants. Requires reference sections at beginning and end of the fiber, to have residuals at high and low intensity measurements. The Stokes and anti-Stokes intensities are measured with detectors, which inherently introduce noise to the measurements. Knowledge of the distribution of the measurement noise is needed for a calibration with weighted observations (Sections 5 and 6 of [1]_) and to project the associated uncertainty to the temperature confidence intervals (Section 7 of [1]_). Two sources dominate the noise in the Stokes and anti-Stokes intensity measurements (Hartog, 2017, p.125). Close to the laser, noise from the conversion of backscatter to electricity dominates the measurement noise. The detecting component, an avalanche photodiode, produces Poisson- distributed noise with a variance that increases linearly with the intensity. The Stokes and anti-Stokes intensities are commonly much larger than the standard deviation of the noise, so that the Poisson distribution can be approximated with a Normal distribution with a mean of zero and a variance that increases linearly with the intensity. At the far-end of the fiber, noise from the electrical circuit dominates the measurement noise. It produces Normal-distributed noise with a mean of zero and a variance that is independent of the intensity. Calculates the variance between the measurements and a best fit at each reference section. This fits a function to the nt * nx measurements with ns * nt + nx parameters, where nx are the total number of reference locations along all sections. The temperature is constant along the reference sections. This fits a two-parameter exponential to the stokes measurements. The temperature is constant and there are no splices/sharp bends in each reference section. Therefore all signal decrease is due to differential attenuation, which is the same for each reference section. The scale of the exponential does differ per reference section. Assumptions: 1) the temperature is the same along a reference section. 2) no sharp bends and splices in the reference sections. 3) Same type of optical cable in each reference section. Idea from discussion at page 127 in Richter, P. H. (1995). Estimating errors in least-squares fitting. For weights used error propagation: w^2 = 1/sigma(lny)^2 = y^2/sigma(y)^2 = y^2 The timeseries and the constant are, of course, highly correlated (Equations 20 and 21 in [1]_), but that is not relevant here as only the product is of interest. The residuals between the fitted product and the Stokes intensity measurements are attributed to the noise from the detector. The variance of the residuals is used as a proxy for the variance of the noise in the Stokes and anti-Stokes intensity measurements. A non-uniform temperature of the reference sections results in an over estimation of the noise variance estimate because all temperature variation is attributed to the noise. Parameters ---------- suppress_info : bool, optional Suppress print statements. use_statsmodels : bool, optional Use statsmodels to fit the exponential. If `False`, use scipy. reshape_residuals : bool, optional Reshape the residuals to the shape of the Stokes intensity st_label : str label of the Stokes, anti-Stokes measurement. E.g., st, ast, rst, rast sections : Dict[str, List[slice]], optional If `None` is supplied, `ds.sections` is used. Define calibration sections. Each section requires a reference temperature time series, such as the temperature measured by an external temperature sensor. They should already be part of the DataStore object. `sections` is defined with a dictionary with its keywords of the names of the reference temperature time series. Its values are lists of slice objects, where each slice object is a fiber stretch that has the reference temperature. Afterwards, `sections` is stored under `ds.sections`. Returns: -------- I_var : float Variance of the residuals between measured and best fit resid : array_like Residuals between measured and best fit Notes: ------ * Because there are a large number of unknowns, spend time on\ calculating an initial estimate. Can be turned off by setting to False. * It is often not needed to use measurements from all time steps. If\ your variance estimate does not change when including measurements from\ more time steps, you have included enough measurements. References: ----------- .. [1] des Tombe, B., Schilperoort, B., & Bakker, M. (2020). Estimation of Temperature and Associated Uncertainty from Fiber-Optic Raman- Spectrum Distributed Temperature Sensing. Sensors, 20(8), 2235. https://doi.org/10.3390/s20082235 Examples: --------- - `Example notebook 4: Calculate variance Stokes intensity measurements\ <https://github.com/\ dtscalibration/python-dts-calibration/blob/main/examples/notebooks/\ 04Calculate_variance_Stokes.ipynb>`_ """ validate_sections_definition(sections=sections) validate_no_overlapping_sections(sections=sections) check_allclose_acquisitiontime(acquisitiontime=acquisitiontime) assert st.dims[0] == "x", "Stokes are transposed" nt = st.coords["time"].size # number of reference points per section (spatial) len_stretch_list = [] y_list = [] # intensities of stokes x_list = [] # length rel to start of section. for alpha for k, stretches in sections.items(): for stretch in stretches: y_list.append(st.sel(x=stretch).data.T.reshape(-1)) _x = st.coords["x"].sel(x=stretch).data.copy() _x -= _x[0] x_list.append(da.tile(_x, nt)) len_stretch_list.append(_x.size) x = np.concatenate(x_list) y = np.concatenate(y_list) var_I, resid = variance_stokes_exponential_helper( nt, x, y, len_stretch_list, use_statsmodels, suppress_info ) if not reshape_residuals: return var_I, resid else: # restructure the residuals, such that they can be plotted and # added to ds resid_res = [] for leni, lenis, lenie in zip( len_stretch_list, nt * np.cumsum([0] + len_stretch_list[:-1]), nt * np.cumsum(len_stretch_list), ): try: resid_res.append(resid[lenis:lenie].reshape((leni, nt), order="F")) except: # noqa: E722 # Dask array does not support order resid_res.append(resid[lenis:lenie].T.reshape((nt, leni)).T) _resid = np.concatenate(resid_res) # _resid_x = self.ufunc_per_section( # sections=sections, label="x", calc_per="all" # ) _resid_x = ufunc_per_section_helper( sections=sections, dataarray=st.coords["x"], calc_per="all" ) isort = np.argsort(_resid_x) resid_x = _resid_x[isort] # get indices from ufunc directly resid = _resid[isort, :] ix_resid = np.array( [np.argmin(np.abs(ai - st.coords["x"].data)) for ai in resid_x] ) resid_sorted = np.full(shape=st.shape, fill_value=np.nan) resid_sorted[ix_resid, :] = resid resid_da = xr.DataArray(data=resid_sorted, coords=st.coords) return var_I, resid_da
[docs] def variance_stokes_linear( st, sections, acquisitiontime, nbin=50, through_zero=False, plot_fit=False ): """Approximate the variance of the noise in Stokes intensity measurements with a linear function of the intensity, suitable for large setups. * `variance_stokes_constant()` for small setups with small variations in\ intensity. Variance of the Stokes measurements is assumed to be the same\ along the entire fiber. * `variance_stokes_exponential()` for small setups with very few time\ steps. Too many degrees of freedom results in an under estimation of the\ noise variance. Almost never the case, but use when calibrating pre time\ step. * `variance_stokes_linear()` for larger setups with more time steps.\ Assumes Poisson distributed noise with the following model:: st_var = a * ds.st + b where `a` and `b` are constants. Requires reference sections at beginning and end of the fiber, to have residuals at high and low intensity measurements. The Stokes and anti-Stokes intensities are measured with detectors, which inherently introduce noise to the measurements. Knowledge of the distribution of the measurement noise is needed for a calibration with weighted observations (Sections 5 and 6 of [1]_) and to project the associated uncertainty to the temperature confidence intervals (Section 7 of [1]_). Two sources dominate the noise in the Stokes and anti-Stokes intensity measurements (Hartog, 2017, p.125). Close to the laser, noise from the conversion of backscatter to electricity dominates the measurement noise. The detecting component, an avalanche photodiode, produces Poisson- distributed noise with a variance that increases linearly with the intensity. The Stokes and anti-Stokes intensities are commonly much larger than the standard deviation of the noise, so that the Poisson distribution can be approximated with a Normal distribution with a mean of zero and a variance that increases linearly with the intensity. At the far-end of the fiber, noise from the electrical circuit dominates the measurement noise. It produces Normal-distributed noise with a mean of zero and a variance that is independent of the intensity. Calculates the variance between the measurements and a best fit at each reference section. This fits a function to the nt * nx measurements with ns * nt + nx parameters, where nx are the total number of reference locations along all sections. The temperature is constant along the reference sections, so the expression of the Stokes power can be split in a time series per reference section and a constant per observation location. Idea from Discussion at page 127 in Richter, P. H. (1995). Estimating errors in least-squares fitting. The timeseries and the constant are, of course, highly correlated (Equations 20 and 21 in [1]_), but that is not relevant here as only the product is of interest. The residuals between the fitted product and the Stokes intensity measurements are attributed to the noise from the detector. The variance of the residuals is used as a proxy for the variance of the noise in the Stokes and anti-Stokes intensity measurements. A non-uniform temperature of the reference sections results in an over estimation of the noise variance estimate because all temperature variation is attributed to the noise. Notes: ------ * Because there are a large number of unknowns, spend time on\ calculating an initial estimate. Can be turned off by setting to False. * It is often not needed to use measurements from all time steps. If\ your variance estimate does not change when including measurements \ from more time steps, you have included enough measurements. References: ----------- .. [1] des Tombe, B., Schilperoort, B., & Bakker, M. (2020). Estimation of Temperature and Associated Uncertainty from Fiber-Optic Raman- Spectrum Distributed Temperature Sensing. Sensors, 20(8), 2235. https://doi.org/10.3390/s20082235 Examples: --------- - `Example notebook 4: Calculate variance Stokes intensity \ measurements <https://github.com/\ dtscalibration/python-dts-calibration/blob/main/examples/notebooks/\ 04Calculate_variance_Stokes.ipynb>`_ Parameters ---------- st_label : str Key under which the Stokes DataArray is stored. E.g., 'st', 'rst' sections : dict, optional Define sections. See documentation nbin : int Number of bins to compute the variance for, through which the linear function is fitted. Make sure that that are at least 50 residuals per bin to compute the variance from. through_zero : bool If True, the variance is computed as: VAR(Stokes) = slope * Stokes If False, VAR(Stokes) = slope * Stokes + offset. From what we can tell from our inital trails, is that the offset seems relatively small, so that True seems a better option for setups where a reference section with very low Stokes intensities is missing. If data with low Stokes intensities available, it is better to not fit through zero, but determine the offset from the data. plot_fit : bool If True plot the variances for each bin and plot the fitted linear function """ validate_sections_definition(sections=sections) validate_no_overlapping_sections(sections=sections) check_allclose_acquisitiontime(acquisitiontime=acquisitiontime) assert st.dims[0] == "x", "Stokes are transposed" _, resid = variance_stokes_constant( sections=sections, st=st, acquisitiontime=acquisitiontime, reshape_residuals=False, ) ix_sec = ufunc_per_section_helper( sections=sections, x_coords=st.coords["x"], calc_per="all" ) st = st.isel(x=ix_sec).values.ravel() diff_st = resid.ravel() ( slope, offset, st_sort_mean, st_sort_var, resid, var_fun, ) = variance_stokes_linear_helper(st, diff_st, nbin, through_zero) if plot_fit: import matplotlib.pyplot as plt plt.figure() plt.scatter(st_sort_mean, st_sort_var, marker=".", c="black") plt.plot( [0.0, st_sort_mean[-1]], [var_fun(0.0), var_fun(st_sort_mean[-1])], c="white", lw=1.3, ) plt.plot( [0.0, st_sort_mean[-1]], [var_fun(0.0), var_fun(st_sort_mean[-1])], c="black", lw=0.8, ) plt.xlabel("intensity") plt.ylabel("intensity variance") return slope, offset, st_sort_mean, st_sort_var, resid, var_fun