Skip to content

Dataset Slicing

area_weight_mean_lat(ds)

For all variables in ds, an area-weighted mean is taken over all latitudes in the dataset.

Parameters:

Name Type Description Default
ds Dataset

Dataset for a particular experiment.

required

Returns:

Type Description
Dataset

Dataset containing averaged variables with no latitude dependence.

Source code in climdyn_tools/utils/ds_slicing.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def area_weight_mean_lat(ds: Dataset) -> Dataset:
    """
    For all variables in `ds`, an area-weighted mean is taken over all latitudes in the dataset.

    Args:
        ds: Dataset for a particular experiment.

    Returns:
        Dataset containing averaged variables with no latitude dependence.
    """
    var_averaged = []
    for var in ds.keys():
        if 'lat' in list(ds[var].coords):
            ds[var] = area_weighting(ds[var]).mean(dim='lat')
            var_averaged += [var]
    print(f"Variables Averaged: {var_averaged}")
    return ds

area_weighting(var, weights=None)

Apply area weighting to the variable var using the cosine of latitude: \(\cos (\phi)\).

Parameters:

Name Type Description Default
var DataArray

Variable to weight e.g. ds.t_surf to weight the surface temperature, where ds is the dataset for the experiment which contains all variables.

required
weights Optional[DataArray]

Weights to use as a function of latitude. If not given, will just take cosine of latitude.

None

Returns:

Type Description
DataArrayWeighted

Area weighted version of var.

Source code in climdyn_tools/utils/ds_slicing.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def area_weighting(var: xr.DataArray, weights: Optional[DataArray] = None) -> DataArrayWeighted:
    """
    Apply area weighting to the variable `var` using the `cosine` of latitude: $\\cos (\\phi)$.

    Args:
        var: Variable to weight e.g. `ds.t_surf` to weight the surface temperature, where
            `ds` is the dataset for the experiment which contains all variables.
        weights: Weights to use as a function of latitude. If not given, will just take cosine of latitude.

    Returns:
        Area weighted version of `var`.
    """
    if weights is None:
        weights = np.cos(np.deg2rad(var.lat))
        weights.name = "weights"
    return var.weighted(weights)

lat_lon_coord_slice(ds, lat, lon)

Returns dataset, ds, keeping only data at the coordinate indicated by (lat[i], lon[i]) for all i.

If ds contained t_surf then the returned dataset would contain t_surf as a function of the variables time and location with each value of location corresponding to a specific (lat, lon) combination. For the original ds, it would be a function of time, lat and lon.

This is inspired by a stack overflow post.

Parameters:

Name Type Description Default
ds Dataset

Dataset for a particular experiment.

required
lat ndarray

float [n_coords] Latitude coordinates to keep.

required
lon ndarray

float [n_coords] Longitude coordinates to keep.

required

Returns:

Type Description
Dataset

Dataset only including the desired coordinates.

Source code in climdyn_tools/utils/ds_slicing.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def lat_lon_coord_slice(ds: Dataset, lat: np.ndarray, lon: np.ndarray) -> Dataset:
    """
    Returns dataset, `ds`, keeping only data at the coordinate indicated by `(lat[i], lon[i])` for all `i`.

    If `ds` contained `t_surf` then the returned dataset would contain `t_surf` as a function of the variables
    `time` and `location` with each value of `location` corresponding to a specific `(lat, lon)` combination.
    For the original `ds`, it would be a function of `time`, `lat` and `lon`.

    This is inspired by a
    [stack overflow post](https://stackoverflow.com/questions/72179103/xarray-select-the-data-at-specific-x-and-y-coordinates).

    Args:
        ds: Dataset for a particular experiment.
        lat: `float [n_coords]`
            Latitude coordinates to keep.
        lon: `float [n_coords]`
            Longitude coordinates to keep.

    Returns:
        Dataset only including the desired coordinates.
    """
    # To get dataset at specific coordinates, not all permutations, turn to xarray first
    lat_xr = xr.DataArray(lat, dims=['location'])
    lon_xr = xr.DataArray(lon, dims=['location'])
    return ds.sel(lat=lat_xr, lon=lon_xr, method="nearest")

lat_lon_rolling(ds, window_lat, window_lon)

This creates a rolling averaged version of the dataset or data-array in the spatial dimension. Returned data will have first np.ceil((window_lat-1)/2) and last np.floor((window_lat-1)/2) values as nan in latitude dimension. The averaging also does not take account of area weighting in latitude dimension.

Parameters:

Name Type Description Default
ds Union[Dataset, DataArray]

Dataset or DataArray to find rolling mean of.

required
window_lat int

Size of window for rolling average in latitude dimension [number of grid points]

required
window_lon int

Size of window for rolling average in longitude dimension [number of grid points].

required

Returns:

Type Description
Union[Dataset, DataArray]

Rolling averaged dataset or DataArray.

Source code in climdyn_tools/utils/ds_slicing.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
def lat_lon_rolling(ds: Union[Dataset, DataArray], window_lat: int, window_lon: int) -> Union[Dataset, DataArray]:
    """
    This creates a rolling averaged version of the dataset or data-array in the spatial dimension.
    Returned data will have first `np.ceil((window_lat-1)/2)` and last `np.floor((window_lat-1)/2)`
    values as `nan` in latitude dimension.
    The averaging also does not take account of area weighting in latitude dimension.

    Args:
        ds: Dataset or DataArray to find rolling mean of.
        window_lat: Size of window for rolling average in latitude dimension [number of grid points]
        window_lon: Size of window for rolling average in longitude dimension [number of grid points].

    Returns:
        Rolling averaged dataset or DataArray.

    """
    ds_roll = ds.pad(lon=window_lon, mode='wrap')       # first pad in lon so wraps around when doing rolling mean
    ds_roll = ds_roll.rolling({'lon': window_lon, 'lat': window_lat}, center=True).mean()
    return ds_roll.isel(lon=slice(window_lon, -window_lon))     # remove the padded longitude values

time_rolling(ds, window_time, wrap=True)

This creates a rolling-averaged version of the dataset or data-array in the time dimension. Useful for when you have an annual average dataset.

Parameters:

Name Type Description Default
ds Union[Dataset, DataArray]

Dataset or DataArray to find rolling mean of.

required
window_time int

Size of window for rolling average in time dimension [number of time units e.g. days]

required
wrap bool

If the first time comes immediately after the last time i.e. for annual mean data

True

Returns:

Type Description
Union[Dataset, DataArray]

Rolling averaged dataset or DataArray.

Source code in climdyn_tools/utils/ds_slicing.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def time_rolling(ds: Union[Dataset, DataArray], window_time: int, wrap: bool = True) -> Union[Dataset, DataArray]:
    """
    This creates a rolling-averaged version of the dataset or data-array in the time dimension. Useful for when
    you have an annual average dataset.

    Args:
        ds: Dataset or DataArray to find rolling mean of.
        window_time: Size of window for rolling average in time dimension [number of time units e.g. days]
        wrap: If the first time comes immediately after the last time i.e. for annual mean data

    Returns:
        Rolling averaged dataset or DataArray.
    """
    if wrap:
        ds_roll = ds.pad(time=window_time, mode='wrap')  # first pad in time so wraps around when doing rolling mean
        ds_roll = ds_roll.rolling(time=window_time, center=True).mean()
        return ds_roll.isel(time=slice(window_time, -window_time))  # remove the padded time values
    else:
        return ds.rolling(time=window_time, center=True).mean()