src package

Subpackages

Submodules

src.constants module

This file is used to save all possible project wide constants.

Includes source folder, the project path, etc.

Example

Import statement at top of script:

from src.constants import PROJECT_PATH, FIGURE_PATH, GWS_DIR
src.constants.atmos_input_file_path(var='ts', model='E', ending='clim60')

Atmos input file.

Parameters
  • var (str, optional) – variable. Defaults to “ts”.

  • model (str, optional) – model character. Defaults to “E”.

  • ending (str, optional) – ending. Defaults to “clim60”.

Returns

input file path.

Return type

str

src.constants.cmip6_ensemble_var(var)

CMIP6 ensemble variable path.

Parameters

var (str) – Variable. e.g. “ts”.

Returns

path to variable folder.

Return type

str

src.constants.cmip6_file(var, model, ending)

Get CMIP6 file, either multimodel mean, or implemented individual model.

Parameters
  • var (str) – Variable string. e.g. “ts”.

  • model (str) – Model string. e.g. “S”.

  • ending (str) – Canonical File ending e.g. “clim60”.

Returns

netcdf file address.

Return type

str

src.constants.ocean_input_file_path(var='ts', model='E', ending='clim', end='.nc')

Ocean input file.

Parameters
  • var (str, optional) – variable. Defaults to “ts” for surface temperature.

  • model (str, optional) – model character. Defaults to “E” for ECMWF.

  • ending (str, optional) – ending. Defaults to “clim60”.

  • end (str, optional) –

Returns

input file path.

Return type

str

src.constants.run_path(cfg, unit_test=False)

Returns run path to store data in.

Parameters
  • cfg (DictConfig) – The config struct.

  • unit_test (bool, optional) – Whether this is a unit test. Defaults to False.

Returns

The path to the relevant directory that exists.

Return type

str

src.main module

A file to run model runs from with hydra/wandb.

Basically a wrapper for bash commands that run the ocean model (fortran/C), and calls the atmospheric and surface flux model (python).

Example

Usage of script:

python3 src/main.py name=test26
src.main.main(cfg)

The main function to run the model and animations.

Takes the src/configs/config.yaml file as input alongside any command line arguments.

Parameters

cfg (DictConfig) – The hyrda dict config from the wrapper.

Return type

None

src.main.sub_main(cfg, unit_test=False)

Subsection of main to run from a unit test or a sensitivity search.

Parameters
  • cfg (DictConfig) – The config from whichever method.

  • unit_test (bool) – Whether or not this is run from a unit test. Defaults to False.

Return type

None

src.metrics module

Different metrics to calculate.

Currently mainly just calculates the nino indices.

src.metrics.calculate_nino3_4_from_noaa()

Calculate the default nino3.4 region from noaa data.

Returns

metric timeseries, climatology

Return type

Tuple[xr.DataArray, xr.DataArray]

Get trends in nino regions for other variables other than sst.

Parameters

setup (ModelSetup) – the filespace object to find things using.

Returns

nino dict.

Return type

dict

src.metrics.load_noaa_data()

Load the data from the noaa ERSSTv4.5 file.

Returns

NOAA dataarray.

Return type

xr.DataArray

src.metrics.nino_calculate(sst, reg='nino3.4', roll_period=3)

Calculate the nino metric for a given region.

https://rabernat.github.io/research_computing_2018/assignment-8-xarray-for-enso.html

https://ncar.github.io/PySpark4Climate/tutorials/Oceanic-Ni%C3%B1o-Index/

Can work on regions nino1+2, nino3, nino3.4, nino4 (or “pac”).

“pac” is a region defined by me mainly for plotting that includes most of the tropical pacific.

Parameters
  • sst (xr.DataArray) – Sea surface temperature datarray in standard format.

  • reg (str, optional) – The region to select for src.xr_utils.sel. Defaults to “nino3.4”.

  • roll_period (int, optional) – The rolling period defined with respect to the time axes. Defaults to 3.

Returns

metric timeseries, climatology

Return type

Tuple[xr.DataArray, xr.DataArray]

src.metrics.replace_nino3_4_from_noaa()

Calculate the default nino3.4 region from noaa data.

Return type

None

src.plot_utils module

Plotting Utilities Module.

Contains generic plotting functions that are used to achieve consistent and easy to produce plots across the project.

Example

Usage with simple plots:

from src.plot_utils import (
    ps_defaults,
    label_subplots,
    get_dim,
    set_dim,
    PALETTE,
    STD_CLR_LIST,
    CAM_BLUE,
    BRICK_RED,
    OX_BLUE,
)

ps_defaults(use_tex=True)

# ---- example set of graphs ---

import numpy as np
import matplotlib.pyplot as plt

fig, axs = plt.subplots(2, 2)

x = np.linspace(0, np.pi, num=100)
axs[0, 0].plot(x, np.sin(x), color=STD_CLR_LIST[0])
axs[0, 1].plot(x, np.cos(x), color=STD_CLR_LIST[1])
axs[1, 0].plot(x, np.sinc(x), color=STD_CLR_LIST[2])
axs[1, 1].plot(x, np.abs(x), color=STD_CLR_LIST[3])

# set size
set_dim(fig, fraction_of_line_width=1, ratio=(5 ** 0.5 - 1) / 2)

# label subplots
label_subplots(axs, start_from=0, fontsize=10)
src.plot_utils.add_units(xr_obj, x_val='X', y_val='Y')

Adding good units to make axes plottable.

Currently only for lat, lon axes, but could be improved to add degrees celsius and so on.

Fails softly.

Parameters
  • (Union[xr.DataArray (xr_da) – Initial datarray/datset (potentially with units for axes).

  • xr.Dataset] – Initial datarray/datset (potentially with units for axes).

  • x_val (str) – Defaults to “X”

  • y_val (str) – Defaults to “Y”

Returns

Datarray/Dataset with correct

units/names for plotting. Assuming that you’ve given the correct x_val and y_val for the object.

Return type

Union[xr.DataArray, xr.Dataset]

src.plot_utils.axis_formatter()

Returns axis formatter for scientific notation.

Returns an object that does the equivalent of:

>>> plt.gca().ticklabel_format(
>>>    axis=ax_format, style="sci", scilimits=(0, 0), useMathText=True
>>> )
Returns:
matplotlib.ticker.ScalarFormatter: An object to pass in to a

matplotlib operation.

Examples

Using with xarray:

import xarray as xr
from src.plot_utils import axis_formatter
da = xr.tutorial.open_dataset("air_temperature").air
da.isel(time=0).plot(cbar_kwargs={"format": axis_formatter()})
Return type

ScalarFormatter

src.plot_utils.cmap(variable_name)

Get cmap from a variable name string.

Ideally colormaps for variables should be consistent throughout the project, and changed in this function. The colormaps are set to be green where there are NaN values, as this has a high contrast with the colormaps used, and should ordinarily represent land, unless something has gone wrong.

Parameters

variable_name (str) – name of variable to give colormap.

Returns

sensible colormap

Return type

matplotlib.colors.LinearSegmentedColormap

Example

Usage example for sea surface temperature:

from src.plot_utils import cmap
cmap_t = cmap("sst")
src.plot_utils.get_dim(width=398.3386, fraction_of_line_width=1, ratio=0.6180339887498949)

Return figure height, width in inches to avoid scaling in latex.

Default width is src.constants.REPORT_WIDTH. Default ratio is golden ratio, with figure occupying full page width.

Parameters
  • width (float, optional) – Textwidth of the report to make fontsizes match. Defaults to src.constants.REPORT_WIDTH.

  • fraction_of_line_width (float, optional) – Fraction of the document width which you wish the figure to occupy. Defaults to 1.

  • ratio (float, optional) – Fraction of figure width that the figure height should be. Defaults to (5 ** 0.5 - 1)/2.

Returns

Dimensions of figure in inches

Return type

fig_dim (tuple)

Example

Here is an example of using this function:

>>> from src.plot_utils import get_dim
>>> dim_tuple = get_dim(fraction_of_line_width=1, ratio=(5 ** 0.5 - 1) / 2)
src.plot_utils.label_subplots(axs, labels=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'], start_from=0, fontsize=10, x_pos=0.02, y_pos=0.95, override=None)

Adds e.g. (a), (b), (c) at the top left of each subplot panel.

Labelling order achieved through ravelling the input list or np.array.

Parameters
  • axs (Sequence[matplotlib.axes.Axes]) – list or np.array of matplotlib.axes.Axes.

  • labels (Sequence[str]) – A sequence of labels for the subplots.

  • start_from (int, optional) – skips first start_from labels. Defaults to 0.

  • fontsize (int, optional) – Font size for labels. Defaults to 10.

  • x_pos (float, optional) – Relative x position of labels. Defaults to 0.02.

  • y_pos (float, optional) – Relative y position of labels. Defaults to 0.95.

  • override (Optional[Literal["inside", "outside", "default"]], optional) – Choose a preset x_pos, y_pos option to overide choices. “Outside” is good for busy colormaps. Defaults to None.

Return type

None

Returns

void; alters the matplotlib.axes.Axes objects

Example

Here is an example of using this function:

>>> from src.plot_utis import label_subplots
>>> label_subplots(axs, start_from=0, fontsize=10)
src.plot_utils.ps_defaults(use_tex=None, dpi=None)

Apply plotting style to produce nice looking figures.

Call this at the start of a script which uses matplotlib. Can enable matplotlib LaTeX backend if it is available.

Uses serif font to fit into latex report. Uses REPORT_WIDTH from src.constants.

Parameters
  • use_tex (bool, optional) – Whether or not to use latex matplotlib backend. Defaults to False.

  • dpi (int, optional) – Which dpi to set for the figures. Defaults to 600 dpi (high quality) in terminal or 150 dpi for notebooks. Larger dpi may be needed for presentations.

Examples

Basic setting for the plotting defaults:

>>> from src.plot_utils import ps_defaults
>>> ps_defaults()
Return type

None

src.plot_utils.set_dim(fig, width=398.3386, fraction_of_line_width=1, ratio=0.6180339887498949)

Set aesthetic figure dimensions to avoid scaling in latex.

Default width is src.constants.REPORT_WIDTH. Default ratio is golden ratio, with figure occupying full page width.

Parameters
  • fig (matplotlib.figure.Figure) – Figure object to resize.

  • width (float) – Textwidth of the report to make fontsizes match. Defaults to src.constants.REPORT_WIDTH.

  • fraction_of_line_width (float, optional) – Fraction of the document width which you wish the figure to occupy. Defaults to 1.

  • ratio (float, optional) – Fraction of figure width that the figure height should be. Defaults to (5 ** 0.5 - 1)/2.

Return type

None

Returns

void; alters current figure to have the desired dimensions

Example

Here is an example of using this function:

>>> from src.plot_utils import set_dim
>>> set_dim(fig, fraction_of_line_width=1, ratio=(5 ** 0.5 - 1) / 2)
src.plot_utils.tex_uf(uf, bracket=False, force_latex=False, exponential=True)

A function to take an uncertainties.ufloat, and return a tex containing string for plotting, which has the right number of decimal places.

Parameters
  • uf (ufloat) – The uncertainties ufloat object.

  • bracket (bool, optional) – Whether or not to add latex brackets around the parameter. Defaults to False.

  • force_latex (bool, optional) – Whether to force latex output. Defaults to False. If false will check matplotlib.rcParams first.

  • exponential (bool, optional) – Whether to put in scientific notation. Defaults to True.

Returns

String ready to be added to a graph label.

Return type

str

src.plot_utils.time_title(ax, time, date_time_formatter='%Y.%m.%d')

Add time title to axes.

Used by e.g. the animation scripts. Hopefully it will consistently deal with a variety of different date formats, including the native format for the ocean model (months since 1960).

Parameters
  • ax (matplotlib.axes.Axes) – axis to add title to.

  • time (Union[np.datetime64, float, cftime.Datetime360Day]) – time string.

  • date_time_formatter (str, optional) – Default is src.constants.DATE_TITLE_FORMAT.

Example

Usage with an xarray.Datarray object:

>>> from src.plot_utils import time_title
>>> time_title(ax, xr_da.time.values[index])
Return type

None

src.search module

search.py

src.search.between_two(choices=['C', 'E'], length=4)

All possible string sequences betweeen two characters for some length.

Parameters
  • choices (List[Char], optional) – Characters to choose between. Defaults to [“C”, “E”].

  • length (int, optional) – _description_. Defaults to 4.

Returns

list of possible sequences.

Return type

(List[str])

src.search.list_to_hydra_input(comb_list)

List to hydra.

Parameters

comb_list (List[str]) – List to go through.

Returns

string to add to terminal input.

Return type

str

src.search.main(settings)

The main function to run the model and animations.

Takes the src/configs/config.yaml file as input alongside any command line arguments.

Parameters

settings (DictConfig) – The hyrda dict config from the wrapper.

Return type

None

src.search.remainder_combinations()

Work out which combinations are still to do.

Returns

list to-do.

Return type

List[str]

src.search.terminal_call(e_frac='0.5,2', clouds='true,false', mem='6EE6,ECEE,ECEC,E6E6,CEEC,E66E,CCCC,CECC,666E,CCEC,CEEE,66E6,6EEE,6E66,6666,CECE,CCCE,E6EE,E666,ECCE,6E6E,ECCC,66EE,CCEE')

Return terminal call.

Parameters
  • e_frac (str, optional) – Defaults to “0.5,2”.

  • clouds (str, optional) – Defaults to “true,false”.

  • mem (str, optional) – Defaults to list_to_hydra_input(remainder_combinations()).

Returns

Terminal call to run model some number of time.

Return type

str

src.search.var_clt_combinations()

Work out which combinations are still to do.

Returns

list to-do.

Return type

List[str]

src.search.var_ts_combinations()

Work out which combinations are still to do.

Returns

list to-do.

Return type

List[str]

src.search.variable_combinations(control='E', exps=['C', '6'], vary=[True, True, True, True])

Get the full set of options to try if there is one control set and multiple experiments deviations.

Parameters
  • control (Char, optional) – _description_. Defaults to “E”.

  • exps (List[Char], optional) – _description_. Defaults to [“C”, “6”].

Returns

List of combinations to try.

Return type

List[str]

src.search.which_comp(mem)

Which figure to compare with.

Parameters

mem (str) – variable string.

Returns

Figure string.

Return type

str

src.utils module

General project utility functions.

exception src.utils.TimeoutException

Bases: Exception

src.utils.calculate_byte_size_recursively(obj, seen=None)

Recursively calculate size of objects in memory in bytes.

From: https://github.com/bosswissam/pysize. Meant as a helper function for get_byte_size.

Parameters
  • obj (object) – The python object to get the size of

  • seen (set, optional) – This variable is needed to for the recusrive function evaluations, to ensure each object only gets counted once. Leave it at “None” to get the full byte size of an object. Defaults to None.

Returns

The size of the object in bytes.

Return type

int

src.utils.get_byte_size(obj)

Return human readable size of a python object in bytes.

Parameters

obj (object) – The python object to analyse

Returns

Human readable string with the size of the object

Return type

str

src.utils.get_default_setup()

Return the default run setup to get the data.

Return type

ModelSetup

src.utils.hr_time(time_in)

Return human readable time as string.

I got fed up with converting the number in my head. Probably runs very quickly.

Parameters

time (float) – time in seconds

Returns

string to print.

Return type

str

Example

120 seconds to human readable string:

>>> from src.utils import hr_time
>>> hr_time(120)
    "2 min 0 s"
src.utils.human_readable_size(num, suffix='B')

Convert a number of bytes into human readable format.

This function is meant as a helper function for get_byte_size.

Parameters
  • num (int) – The number of bytes to convert

  • suffix (str, optional) – The suffix to use for bytes. Defaults to ‘B’.

Returns

A human readable version of the number of bytes.

Return type

str

src.utils.in_notebook()

Check if in notebook.

Taken from this answer: https://stackoverflow.com/a/22424821

Returns

whether in notebook.

Return type

bool

src.utils.time_limit(seconds)

Time limit manager.

Function taken from:

https://stackoverflow.com/questions/366682/ how-to-limit-execution-time-of-a-function-call

Parameters

seconds (int) – how many seconds to wait until timeout.

Example

Call a function which will take longer than the time limit:

import time
from src.utils import time_limit, TimeoutException

def long_function_call():
    for t in range(5):
        print("t=", t, "seconds")
        time.sleep(1)
try:
    with time_limit(3):
        long_function_call()
        assert False
except TimeoutException as e:
    print("Timed out!")
except:
    print("A different exception")
Return type

None

src.utils.time_stamp()

Return the current local time.

Returns

Time string format “%Y-%m-%d %H:%M:%S”.

Return type

str

src.utils.timeit(method)

src.timeit is a wrapper for performance analysis.

It should return the time taken for a function to run. Alters log_time dict if fed in. Add @timeit to the function you want to time. Function needs **kwargs if you want it to be able to feed in log_time dict.

Parameters

method (Callable) – the function that it takes as an input

Examples

Here is an example with the tracking functionality and without:

>>> from src.utils import timeit
>>> @timeit
... def loop(**kwargs):
...     total = 0
...     for i in range(int(10e2)):
...         for j in range(int(10e2)):
...             total += 1
>>> tmp_log_d = {}
>>> loop(log_time=tmp_log_d)
>>> print(tmp_log_d["loop"])
>>> loop()
Return type

Callable

src.wandb_utils module

Sets up the weights and biases script and provides functionality to get data from wandb.

src.wandb_utils.aggregate_matches(summary_df, filter_df, results=['trend_nino3.4 [degC]', 'mean_nino3.4 [degC]', 'mean_pac [degC]'], include_std_dev=True, print_missing=False)

Aggregate the matches between two dataframes to find the mean and std devation of a set of results.

Parameters
  • summary_df (pd.DataFrame) – The summary df create by summary_table.

  • filter_df (pd.DataFrame) – The dataframe to filter by.

  • results (List[str], optional) – _description_. Defaults to RESULTS.

  • include_std_dev (bool, optional) – Whether to calculate standard devation. Defaults to True.

  • print_missing (bool, optional) – Whether to highlight missing runs from ensemble. Defaults to False.

Returns

Includes uncertainty.ufloat values if include_std_dev=True.

Return type

pd.DataFrame

src.wandb_utils.aggregate_table(project='sdat2/seager19', mem_list=['EEEE', 'EECE', 'EEEC', 'EECC'])

Make aggregate table.

Parameters
  • project (str, optional) – _description_. Defaults to DEFAULT_PROJECT.

  • mem_list (List[str], optional) – _description_. Defaults to DEFAULT_MEM_LIST.

Returns

_description_

Return type

pd.DataFrame

src.wandb_utils.archive_dir_from_config(cfg)

Get the archived folder from the names stored online.

Parameters

cfg (Union[DictConfig, dict]) – The config from the run.

Returns

The archive directory path string.

Return type

str

src.wandb_utils.cd_variation_comp(e_frac=0.5)

Vary drag coefficient and get the final metric.

Parameters

e_frac (float) – Defaults to 0.5.

Returns

mem_dict.

Return type

dict

src.wandb_utils.change_table(project='sdat2/seager19', mem_list=['EEEE', 'EECE', 'EEEC', 'EECC'])

Return a table with the differences between ECMWF run and the different inputs

Args:

project (str, optional): Which project to read. Defaults to DEFAULT_PROJECT. mem_list (List[str], optional): What list of inputs to compare. Defaults to DEFAULT_MEM_LIST.

Returns:

Tuple[pd.DataFrame, str]: The change table,

and the name of the new variable column.

Return type

Tuple[DataFrame, str]

src.wandb_utils.didnt_blow_up(rn)

Test if the run blew up. True if it didn’t blow up.

Parameters

rn (wandb.apis.public.Run) – run.

Returns

whether there was any blow up during the run.

Return type

bool

src.wandb_utils.find_missing(df_list, param=['c_d', 'eps_days', 'eps_frac', 'vary_cloud_const'])

Find which runs are missing from the project and print the commands to add them in.

Parameters
  • df_list (List[pd.DataFrame]) – list of dataframes by initial aggregation.

  • param (List[str], optional) – Parameters to compare. Defaults to PARAM.

Return type

None

src.wandb_utils.finished_names(project='sdat2/seager19')

Return all the finished run names.

Returns

list of run names.

Return type

List[str]

src.wandb_utils.fix_config(config)

Turn the config dict back into a DictConfig object.

Parameters

config (Union[dict, DictConfig]) – config dictionary.

Returns

original configuration.

Return type

DictConfig

src.wandb_utils.get_full_csv(project='sdat2/seager19')

Get the full csv.

Return type

DataFrame

src.wandb_utils.get_v(inp)

Get version of compilers.

Parameters

inp (str) – input string. E.g. “gfortran -v”.

Returns

selects the fortran version (or gcc version)

part of the output.

Return type

str

src.wandb_utils.get_wandb_data(save_path=None)

Get wandb data (and save it?) now doesn’t redownload.

Parameters

save_path (Optional[str], optional) – Path to new csv file. Defaults to None. If it is None then doesn’t try to save.

Returns

The pandas dataframe of final results.

Return type

pd.DataFrame

src.wandb_utils.metric_conv_data(metric_name='mean_pac', prefix='cd_', ex_list=['cd_norm', 'nummode'], control_variable_list=[(('atm', 'k_days'), 10), (('atm', 'e_frac'), 2)], index_by=('coup', 'c_d'), project='sdat2/seager19')

Generate the data for the convergence of a particular item.

Used in src.visualisation.convergence.metric_conv_plot

Parameters

metric_name (str, optional) – Which keyword to use. Defaults to “mean_pac”.

Returns

metric_dict, setup_dict.

Return type

Tuple[dict, dict]

src.wandb_utils.output_fig_2_data(project='sdat2/seager19')

Output the figure 2 data for plotting.

Parameters

project (str, optional) – Wandb project to read. Defaults to DEFAULT_PROJECT.

Returns

The change table, and the name of the new variable column.

Return type

Tuple[List[pd.DataFrame], str]

src.wandb_utils.setup_from_config(cfg)

Gets the setup object for the archived run from the config.

Parameters

cfg (DictConfig) – Either the dictconfig or the dict.

Returns

The model setup object.

Return type

ModelSetup

src.wandb_utils.setup_from_name(name, project='sdat2/seager19')

Get the model setup from a name.

Parameters

name (str) – model name.

Returns

The model setup object.

Return type

ModelSetup

src.wandb_utils.start_wandb(cfg, unit_test=False)

Intialises wandb for run.

Weights and biases provides the run tracking for the model runs at different parameter settings.

TODO: Need to improve the ability to initialise wandb from a unit test.

Parameters
  • cfg (DictConfig) – The config settings to pass in to the wandb syncing.

  • unit_test (bool, optional) – Whether or not this is a unit-test. Defaults to False. If this is a unit test will currently not initialise wandb, but will call related functions.

Return type

None

src.wandb_utils.summary_table(project='sdat2/seager19')

Key input parameters, key output parameters, in a simple dataframe.

index=number

paramters=mem, ${ts}${clt}${sfcwind}${rh}${pr}${ps}${tau}, c_d, eps_frac, eps,

Key indexes: trend_nino3.4, mean_nino3.4, mean_pac

Parameters

project (str, optional) – Which weights and biases project to scan. Defaults to DEFAULT_PROJECT.

Returns

A dataframe.

Return type

pd.DataFrame

src.xr_utils module

Utilities around opening and processing netcdfs from this project.

src.xr_utils.can_coords(xr_obj)

Transform an object into having the canonical coordinates if possible.

Fail hard if impossible.

Parameters

xr_obj (Union[xr.Dataset, xr.DataArray]) – The dataset or datarray to canonicalise.

Returns

The dataset that has been canoncilised.

Function will raise an assertion error otherwise.

Return type

Union[xr.Dataset, xr.DataArray]

src.xr_utils.clip(da, pac=True, mask_land=True)

Clip a datarray to the pacific using sel, and mask the land.

Parameters
  • da (xr.DataArray) – The datarray to pass in.

  • pac (bool, optional) – Whether to focus on pacific. Defaults to True.

  • mask_land (bool, optional) – Whether to nan out land. Defaults to True.

Returns

da with those operations applied to it.

Return type

xr.DataArray

src.xr_utils.cut_and_taper(da, y_var='Y', x_var='X')

Cut and taper a field by latitude.

Since the atmosphere model dynamics are only applicable in the tropics, the computed wind stress anomaly is only applied to the ocean model between 20° S and 20° N, and is linearly tapered to zero at 25° S and 25° N.

Currently only copes if the array is two dimensional.

Parameters
  • da (xr.DataArray) – The datarray.

  • y_var (str, optional) – The name of the Y coordinate. Defaults to “Y”.

  • x_var (str, optional) – The name of the X coordinate. Defaults to “X”.

Returns

The datarray with the function applied.

Return type

xr.DataArray

Example

Should achieve:

if da.Y > 25 or da.Y < -25:
    da = 0.0
elif 20 <= da.Y <= 25:
    da = da - (0.2* (da.Y- 20))) * da
else -20 >= da.Y >= -25:
    da = da - (0.2* (-da.Y - 20)) * da

Usage:

from src.xr_utils import open_dataset, cut_and_taper
from src.constants import OCEAN_DATA_PATH
da_new: xr.DataArray = open_dataset(OCEAN_DATA_PATH / "qflx.nc").qflx
cut_and_taper(da_new.isel(Z=0, T=0, variable=0))
src.xr_utils.fix_calendar(xr_in, timevar='T')

Fix and decode the calendar.

Parameters
  • xr_in (Union[xr.Dataset, xr.DataArray]) – the xarray object input

  • timevar (str, optional) – The time variable name. Defaults to “T”.

Returns

same type and xr_in with fixed calendar.

Return type

Union[xr.Dataset, xr.DataArray]

src.xr_utils.get_clim(xr_da)

Get the climateology of an xr.DataArray.

Parameters

xr_da (xr.DataArray) – The input datarray. Assumes that the time coordinate is canonical “T”.

Returns

The climatology for the time period.

Return type

xr.DataArray

src.xr_utils.get_trend(da, min_clim_f=False, output='rise', t_var='T', make_hatch_mask=False, keep_ds=False, uncertainty=False)

Returns either the linear trend rise, or the linear trend slope, possibly with the array to hatch out where the trend is not significant.

Uses xr.polyfit order 1 to do everything.

Parameters
  • da (xr.DataArray) – the timeseries.

  • min_clim_f (bool, optional) – whether to calculate and remove the climateology. Defaults to false.

  • output (Literal[, optional) – What to return. Defaults to “rise”.

  • t_var (str, optional) – The time variable name. Defaults to “T”. Could be changed to another variable that you want to fit along.

  • make_hatch_mask (bool, optional) – Whether or not to also return a DataArray of boolean values to indicate where is not significant. Defaults to False. Will only work if you’re passing in an xarray object.

  • uncertainty (bool, optional) – Whether to return a ufloat object if doing linear regression on a single timeseries. Defaults to false.

Returns

The rise/slope over the time period, possibly with the hatch array if that opition is selected for a grid.

Return type

Union[float, ufloat, xr.DataArray, Tuple[xr.DataArray, xr.DataArray]]

src.xr_utils.min_clim(xr_da, clim=None)

Take away the climatology from an xr.DataArray.

Parameters
  • xr_da (xr.DataArray) – The xarray input. Canonical coords.

  • clim (Optional[xr.DataArray], optional) – The climateology. Defaults to None, which will remake climatology.

Returns

The anomaly.

Return type

xr.DataArray

src.xr_utils.open_dataarray(path)

Open an xarray dataarray and format it.

Will automatically try to make the dataset Coordinates

into the canonical coordinate names (using can_coords).

Will also decode the time axis.

TODO: add option for opening of datarrayys that just ensures they open, rather than changing their atrributes.

Parameters

path (Union[str, pathlib.Path]) – the path to the netcdf datarray file.

Returns

The formatted datarray.

Return type

xr.DataArray

src.xr_utils.open_dataset(path, use_can_coords=False)

Open a dataset and formats it.

Will only work if there is only one set of each coordinate at the moment.

Parameters
  • path (Union[str, pathlib.Path]) – the path to the netcdf dataset file.

  • can_coords (bool) – whether or not to try and make the coordinate into the canonical names.

Returns

The formatted dataset. Will have time variables decoded.

Return type

xr.Dataset

src.xr_utils.rem_var(da1)

remove the ‘variable’ from a dataarray

Return type

DataArray

src.xr_utils.sel(xr_obj, reg='pac')

Select a region of the dataset or datarray.

Assumes

reg options: “pac”, “nino1+2”, “nino3”, “nino3.4”, “nino3” https://www.ncdc.noaa.gov/teleconnections/enso/indicators/sst/

From Figure 1: Distribution of 60-year trends in the NINO3.4 SST index (SST averaged over 5° S−5° N and 170° W−120° W) for end dates from 2008–2017.

Parameters
  • xr_obj (Union[xr.Dataset, xr.DataArray]) – The xarray object. Needs to have canonical coordinates.

  • reg (str, optional) – The keyword region to select. Defaults to ‘pac’.

Returns

The downsized xarray object.

Return type

Union[xr.Dataset, xr.DataArray]

Example

Effect example:

if reg == "pac":
    return xr_obj.sel(X=slice(100, 290), Y=slice(-30, 30))
elif reg == "nino3.4":
    return xr_obj.sel(X=slice(190, 240), Y=slice(-5, 5))
elif reg == "nino4":
    return xr_obj.sel(X=slice(160, 210), Y=slice(-5, 5))
elif reg == "nino3":
    return xr_obj.sel(X=slice(210, 270), Y=slice(-5, 5))
elif reg == "nino1+2":
    return xr_obj.sel(X=slice(270, 280), Y=slice(-10, 0))
src.xr_utils.spatial_mean(da)

Average a datarray over “X” and “Y” coordinates.

Spatially weighted.

Originally from: https://ncar.github.io/PySpark4Climate/tutorials/Oceanic-Ni%C3%B1o-Index/ (although their version is wrong as it assumes numpy input is degrees)

https://numpy.org/doc/stable/reference/generated/numpy.cos.html https://numpy.org/doc/stable/reference/generated/numpy.radians.html

The average should behave like:

\begin{equation} \bar{T}_{\text {lat }}=\frac{1}{n \text { Lon }} \sum_{i=1}^{n \text{Lon}} T_{\text \text{lon}, i} \end{equation} \begin{equation} \bar{T}_{\text {month }}=\frac{\sum_{j=1}^{n L a t} \cos \left(\text { lat }_{j}\right) \bar{T}_{\text {lat }, j}}{\sum_{j=1}^{\text {n \text{Lat} }} \cos \left(\text { lat }_{j}\right)} \end{equation}
Parameters

da (xr.DataArray) – da to average.

Returns

avarage of da.

Return type

xr.DataArray

Module contents