src package

Subpackages

Submodules

src.constants module

This file is used to save all possible project wide constants.

Includes source folder, the project path, etc.

Example

Import statement at top of script:

from src.constants import PROJECT_PATH, FIGURE_PATH, GWS_DIR

src.constants.atmos_input_file_path(var='ts', model='E', ending='clim60')

Atmos input file.

Parameters

var (str, optional) – variable. Defaults to “ts”.
model (str, optional) – model character. Defaults to “E”.
ending (str, optional) – ending. Defaults to “clim60”.

Returns

input file path.

Return type

str

src.constants.cmip6_ensemble_var(var)

CMIP6 ensemble variable path.

Parameters: var (str) – Variable. e.g. “ts”.
Returns: path to variable folder.
Return type: str

src.constants.cmip6_file(var, model, ending)

Get CMIP6 file, either multimodel mean, or implemented individual model.

Parameters

var (str) – Variable string. e.g. “ts”.
model (str) – Model string. e.g. “S”.
ending (str) – Canonical File ending e.g. “clim60”.

Returns

netcdf file address.

Return type

str

src.constants.ocean_input_file_path(var='ts', model='E', ending='clim', end='.nc')

Ocean input file.

Parameters

var (str, optional) – variable. Defaults to “ts” for surface temperature.
model (str, optional) – model character. Defaults to “E” for ECMWF.
ending (str, optional) – ending. Defaults to “clim60”.
end (str, optional) –

Returns

input file path.

Return type

str

src.constants.run_path(cfg, unit_test=False)

Returns run path to store data in.

Parameters

cfg (DictConfig) – The config struct.
unit_test (bool, optional) – Whether this is a unit test. Defaults to False.

Returns

The path to the relevant directory that exists.

Return type

str

src.main module

A file to run model runs from with hydra/wandb.

Basically a wrapper for bash commands that run the ocean model (fortran/C), and calls the atmospheric and surface flux model (python).

Example

Usage of script:

python3 src/main.py name=test26

src.main.main(cfg)

The main function to run the model and animations.

Takes the src/configs/config.yaml file as input alongside any command line arguments.

Parameters: cfg (DictConfig) – The hyrda dict config from the wrapper.
Return type: None

src.main.sub_main(cfg, unit_test=False)

Subsection of main to run from a unit test or a sensitivity search.

Parameters

cfg (DictConfig) – The config from whichever method.
unit_test (bool) – Whether or not this is run from a unit test. Defaults to False.

Return type

None

src.metrics module

Different metrics to calculate.

Currently mainly just calculates the nino indices.

src.metrics.calculate_nino3_4_from_noaa()

Calculate the default nino3.4 region from noaa data.

Returns: metric timeseries, climatology
Return type: Tuple[xr.DataArray, xr.DataArray]

src.metrics.get_other_trends(setup)

Get trends in nino regions for other variables other than sst.

Parameters: setup (ModelSetup) – the filespace object to find things using.
Returns: nino dict.
Return type: dict

src.metrics.load_noaa_data()

Load the data from the noaa ERSSTv4.5 file.

Returns: NOAA dataarray.
Return type: xr.DataArray

src.metrics.nino_calculate(sst, reg='nino3.4', roll_period=3)

Calculate the nino metric for a given region.

https://rabernat.github.io/research_computing_2018/assignment-8-xarray-for-enso.html

https://ncar.github.io/PySpark4Climate/tutorials/Oceanic-Ni%C3%B1o-Index/

Can work on regions nino1+2, nino3, nino3.4, nino4 (or “pac”).

“pac” is a region defined by me mainly for plotting that includes most of the tropical pacific.

Parameters

sst (xr.DataArray) – Sea surface temperature datarray in standard format.
reg (str, optional) – The region to select for src.xr_utils.sel. Defaults to “nino3.4”.
roll_period (int, optional) – The rolling period defined with respect to the time axes. Defaults to 3.

Returns

metric timeseries, climatology

Return type

Tuple[xr.DataArray, xr.DataArray]

src.metrics.replace_nino3_4_from_noaa()

Calculate the default nino3.4 region from noaa data.

Return type: None

src.plot_utils module

Plotting Utilities Module.

Contains generic plotting functions that are used to achieve consistent and easy to produce plots across the project.

Example

Usage with simple plots:

from src.plot_utils import (
    ps_defaults,
    label_subplots,
    get_dim,
    set_dim,
    PALETTE,
    STD_CLR_LIST,
    CAM_BLUE,
    BRICK_RED,
    OX_BLUE,
)

ps_defaults(use_tex=True)

# ---- example set of graphs ---

import numpy as np
import matplotlib.pyplot as plt

fig, axs = plt.subplots(2, 2)

x = np.linspace(0, np.pi, num=100)
axs[0, 0].plot(x, np.sin(x), color=STD_CLR_LIST[0])
axs[0, 1].plot(x, np.cos(x), color=STD_CLR_LIST[1])
axs[1, 0].plot(x, np.sinc(x), color=STD_CLR_LIST[2])
axs[1, 1].plot(x, np.abs(x), color=STD_CLR_LIST[3])

# set size
set_dim(fig, fraction_of_line_width=1, ratio=(5 ** 0.5 - 1) / 2)

# label subplots
label_subplots(axs, start_from=0, fontsize=10)

src.plot_utils.add_units(xr_obj, x_val='X', y_val='Y')

Adding good units to make axes plottable.

Currently only for lat, lon axes, but could be improved to add degrees celsius and so on.

Fails softly.

Parameters

(Union[xr.DataArray (xr_da) – Initial datarray/datset (potentially with units for axes).
xr.Dataset] – Initial datarray/datset (potentially with units for axes).
x_val (str) – Defaults to “X”
y_val (str) – Defaults to “Y”

Returns

Datarray/Dataset with correct: units/names for plotting. Assuming that you’ve given the correct x_val and y_val for the object.

Return type

Union[xr.DataArray, xr.Dataset]

src.plot_utils.axis_formatter()

Returns axis formatter for scientific notation.

Returns an object that does the equivalent of:
>>> plt.gca().ticklabel_format(
>>>    axis=ax_format, style="sci", scilimits=(0, 0), useMathText=True
>>> )
Returns:

matplotlib.ticker.ScalarFormatter: An object to pass in to a
matplotlib operation.

Examples

Using with xarray:

import xarray as xr
from src.plot_utils import axis_formatter
da = xr.tutorial.open_dataset("air_temperature").air
da.isel(time=0).plot(cbar_kwargs={"format": axis_formatter()})

Return type: ScalarFormatter

src.plot_utils.cmap(variable_name)

Get cmap from a variable name string.

Ideally colormaps for variables should be consistent throughout the project, and changed in this function. The colormaps are set to be green where there are NaN values, as this has a high contrast with the colormaps used, and should ordinarily represent land, unless something has gone wrong.

Parameters: variable_name (str) – name of variable to give colormap.
Returns: sensible colormap
Return type: matplotlib.colors.LinearSegmentedColormap

Example

Usage example for sea surface temperature:

from src.plot_utils import cmap
cmap_t = cmap("sst")

src.plot_utils.get_dim(width=398.3386, fraction_of_line_width=1, ratio=0.6180339887498949)

Return figure height, width in inches to avoid scaling in latex.

Default width is src.constants.REPORT_WIDTH. Default ratio is golden ratio, with figure occupying full page width.

Parameters

width (float, optional) – Textwidth of the report to make fontsizes match. Defaults to src.constants.REPORT_WIDTH.
fraction_of_line_width (float, optional) – Fraction of the document width which you wish the figure to occupy. Defaults to 1.
ratio (float, optional) – Fraction of figure width that the figure height should be. Defaults to (5 ** 0.5 - 1)/2.

Returns

Dimensions of figure in inches

Return type

fig_dim (tuple)

Example

Here is an example of using this function:

>>> from src.plot_utils import get_dim
>>> dim_tuple = get_dim(fraction_of_line_width=1, ratio=(5 ** 0.5 - 1) / 2)

src.plot_utils.label_subplots(axs, labels=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'], start_from=0, fontsize=10, x_pos=0.02, y_pos=0.95, override=None)

Adds e.g. (a), (b), (c) at the top left of each subplot panel.

Labelling order achieved through ravelling the input list or np.array.

Parameters

axs (Sequence[matplotlib.axes.Axes]) – list or np.array of matplotlib.axes.Axes.
labels (Sequence[str]) – A sequence of labels for the subplots.
start_from (int, optional) – skips first start_from labels. Defaults to 0.
fontsize (int, optional) – Font size for labels. Defaults to 10.
x_pos (float, optional) – Relative x position of labels. Defaults to 0.02.
y_pos (float, optional) – Relative y position of labels. Defaults to 0.95.
override (Optional[Literal["inside", "outside", "default"]], optional) – Choose a preset x_pos, y_pos option to overide choices. “Outside” is good for busy colormaps. Defaults to None.

Return type

None

Returns

void; alters the matplotlib.axes.Axes objects

Example

Here is an example of using this function:

>>> from src.plot_utis import label_subplots
>>> label_subplots(axs, start_from=0, fontsize=10)

src.plot_utils.ps_defaults(use_tex=None, dpi=None)

Apply plotting style to produce nice looking figures.

Call this at the start of a script which uses matplotlib. Can enable matplotlib LaTeX backend if it is available.

Uses serif font to fit into latex report. Uses REPORT_WIDTH from src.constants.

Parameters

use_tex (bool, optional) – Whether or not to use latex matplotlib backend. Defaults to False.
dpi (int, optional) – Which dpi to set for the figures. Defaults to 600 dpi (high quality) in terminal or 150 dpi for notebooks. Larger dpi may be needed for presentations.

Examples

Basic setting for the plotting defaults:

>>> from src.plot_utils import ps_defaults
>>> ps_defaults()

Return type: None

src.plot_utils.set_dim(fig, width=398.3386, fraction_of_line_width=1, ratio=0.6180339887498949)

Set aesthetic figure dimensions to avoid scaling in latex.

Default width is src.constants.REPORT_WIDTH. Default ratio is golden ratio, with figure occupying full page width.

Parameters

fig (matplotlib.figure.Figure) – Figure object to resize.
width (float) – Textwidth of the report to make fontsizes match. Defaults to src.constants.REPORT_WIDTH.
fraction_of_line_width (float, optional) – Fraction of the document width which you wish the figure to occupy. Defaults to 1.
ratio (float, optional) – Fraction of figure width that the figure height should be. Defaults to (5 ** 0.5 - 1)/2.

Return type

None

Returns

void; alters current figure to have the desired dimensions

Example

Here is an example of using this function:

>>> from src.plot_utils import set_dim
>>> set_dim(fig, fraction_of_line_width=1, ratio=(5 ** 0.5 - 1) / 2)

src.plot_utils.tex_uf(uf, bracket=False, force_latex=False, exponential=True)

A function to take an uncertainties.ufloat, and return a tex containing string for plotting, which has the right number of decimal places.

Parameters

uf (ufloat) – The uncertainties ufloat object.
bracket (bool, optional) – Whether or not to add latex brackets around the parameter. Defaults to False.
force_latex (bool, optional) – Whether to force latex output. Defaults to False. If false will check matplotlib.rcParams first.
exponential (bool, optional) – Whether to put in scientific notation. Defaults to True.

Returns

String ready to be added to a graph label.

Return type

str

src.plot_utils.time_title(ax, time, date_time_formatter='%Y.%m.%d')

Add time title to axes.

Used by e.g. the animation scripts. Hopefully it will consistently deal with a variety of different date formats, including the native format for the ocean model (months since 1960).

Parameters

ax (matplotlib.axes.Axes) – axis to add title to.
time (Union[np.datetime64, float, cftime.Datetime360Day]) – time string.
date_time_formatter (str, optional) – Default is src.constants.DATE_TITLE_FORMAT.

Example

Usage with an xarray.Datarray object:

>>> from src.plot_utils import time_title
>>> time_title(ax, xr_da.time.values[index])

Return type: None

src.search module

search.py

src.search.between_two(choices=['C', 'E'], length=4)

All possible string sequences betweeen two characters for some length.

Parameters

choices (List[Char], optional) – Characters to choose between. Defaults to [“C”, “E”].
length (int, optional) – _description_. Defaults to 4.

Returns

list of possible sequences.

Return type

(List[str])

src.search.list_to_hydra_input(comb_list)

List to hydra.

Parameters: comb_list (List[str]) – List to go through.
Returns: string to add to terminal input.
Return type: str

src.search.main(settings)

The main function to run the model and animations.

Takes the src/configs/config.yaml file as input alongside any command line arguments.

Parameters: settings (DictConfig) – The hyrda dict config from the wrapper.
Return type: None

src.search.remainder_combinations()

Work out which combinations are still to do.

Returns: list to-do.
Return type: List[str]

src.search.terminal_call(e_frac='0.5,2', clouds='true,false', mem='6EE6,ECEE,ECEC,E6E6,CEEC,E66E,CCCC,CECC,666E,CCEC,CEEE,66E6,6EEE,6E66,6666,CECE,CCCE,E6EE,E666,ECCE,6E6E,ECCC,66EE,CCEE')

Return terminal call.

Parameters

e_frac (str, optional) – Defaults to “0.5,2”.
clouds (str, optional) – Defaults to “true,false”.
mem (str, optional) – Defaults to list_to_hydra_input(remainder_combinations()).

Returns

Terminal call to run model some number of time.

Return type

str

src.search.var_clt_combinations()

Work out which combinations are still to do.

Returns: list to-do.
Return type: List[str]

src.search.var_ts_combinations()

Work out which combinations are still to do.

Returns: list to-do.
Return type: List[str]

src.search.variable_combinations(control='E', exps=['C', '6'], vary=[True, True, True, True])

Get the full set of options to try if there is one control set and multiple experiments deviations.

Parameters

control (Char, optional) – _description_. Defaults to “E”.
exps (List[Char], optional) – _description_. Defaults to [“C”, “6”].

Returns

List of combinations to try.

Return type

List[str]

src.search.which_comp(mem)

Which figure to compare with.

Parameters: mem (str) – variable string.
Returns: Figure string.
Return type: str

src.utils module

General project utility functions.

exception src.utils.TimeoutException: Bases: Exception

src.utils.calculate_byte_size_recursively(obj, seen=None)

Recursively calculate size of objects in memory in bytes.

From: https://github.com/bosswissam/pysize. Meant as a helper function for get_byte_size.

Parameters

obj (object) – The python object to get the size of
seen (set, optional) – This variable is needed to for the recusrive function evaluations, to ensure each object only gets counted once. Leave it at “None” to get the full byte size of an object. Defaults to None.

Returns

The size of the object in bytes.

Return type

int

src.utils.get_byte_size(obj)

Return human readable size of a python object in bytes.

Parameters: obj (object) – The python object to analyse
Returns: Human readable string with the size of the object
Return type: str

src.utils.get_default_setup()

Return the default run setup to get the data.

Return type: ModelSetup

src.utils.hr_time(time_in)

Return human readable time as string.

I got fed up with converting the number in my head. Probably runs very quickly.

Parameters: time (float) – time in seconds
Returns: string to print.
Return type: str

Example

120 seconds to human readable string:

>>> from src.utils import hr_time
>>> hr_time(120)
    "2 min 0 s"

src.utils.human_readable_size(num, suffix='B')

Convert a number of bytes into human readable format.

This function is meant as a helper function for get_byte_size.

Parameters

num (int) – The number of bytes to convert
suffix (str, optional) – The suffix to use for bytes. Defaults to ‘B’.

Returns

A human readable version of the number of bytes.

Return type

str

src.utils.in_notebook()

Check if in notebook.

Taken from this answer: https://stackoverflow.com/a/22424821

Returns: whether in notebook.
Return type: bool

src.utils.time_limit(seconds)

Time limit manager.

Function taken from:

https://stackoverflow.com/questions/366682/ how-to-limit-execution-time-of-a-function-call

Parameters: seconds (int) – how many seconds to wait until timeout.

Example

Call a function which will take longer than the time limit:

import time
from src.utils import time_limit, TimeoutException

def long_function_call():
    for t in range(5):
        print("t=", t, "seconds")
        time.sleep(1)
try:
    with time_limit(3):
        long_function_call()
        assert False
except TimeoutException as e:
    print("Timed out!")
except:
    print("A different exception")

Return type: None

src.utils.time_stamp()

Return the current local time.

Returns: Time string format “%Y-%m-%d %H:%M:%S”.
Return type: str

src.utils.timeit(method)

src.timeit is a wrapper for performance analysis.

It should return the time taken for a function to run. Alters log_time dict if fed in. Add @timeit to the function you want to time. Function needs **kwargs if you want it to be able to feed in log_time dict.

Parameters: method (Callable) – the function that it takes as an input

Examples

Here is an example with the tracking functionality and without:

>>> from src.utils import timeit
>>> @timeit
... def loop(**kwargs):
...     total = 0
...     for i in range(int(10e2)):
...         for j in range(int(10e2)):
...             total += 1
>>> tmp_log_d = {}
>>> loop(log_time=tmp_log_d)
>>> print(tmp_log_d["loop"])
>>> loop()

Return type: Callable

src.wandb_utils module

Sets up the weights and biases script and provides functionality to get data from wandb.

src.wandb_utils.aggregate_matches(summary_df, filter_df, results=['trend_nino3.4 [degC]', 'mean_nino3.4 [degC]', 'mean_pac [degC]'], include_std_dev=True, print_missing=False)

Aggregate the matches between two dataframes to find the mean and std devation of a set of results.

Parameters

summary_df (pd.DataFrame) – The summary df create by summary_table.
filter_df (pd.DataFrame) – The dataframe to filter by.
results (List[str], optional) – _description_. Defaults to RESULTS.
include_std_dev (bool, optional) – Whether to calculate standard devation. Defaults to True.
print_missing (bool, optional) – Whether to highlight missing runs from ensemble. Defaults to False.

Returns

Includes uncertainty.ufloat values if include_std_dev=True.

Return type

pd.DataFrame

src.wandb_utils.aggregate_table(project='sdat2/seager19', mem_list=['EEEE', 'EECE', 'EEEC', 'EECC'])

Make aggregate table.

Parameters

project (str, optional) – _description_. Defaults to DEFAULT_PROJECT.
mem_list (List[str], optional) – _description_. Defaults to DEFAULT_MEM_LIST.

Returns

_description_

Return type

pd.DataFrame

src.wandb_utils.archive_dir_from_config(cfg)

Get the archived folder from the names stored online.

Parameters: cfg (Union[DictConfig, dict]) – The config from the run.
Returns: The archive directory path string.
Return type: str

src.wandb_utils.cd_variation_comp(e_frac=0.5)

Vary drag coefficient and get the final metric.

Parameters: e_frac (float) – Defaults to 0.5.
Returns: mem_dict.
Return type: dict

src.wandb_utils.change_table(project='sdat2/seager19', mem_list=['EEEE', 'EECE', 'EEEC', 'EECC'])

Return a table with the differences between ECMWF run and the different inputs

Args:
project (str, optional): Which project to read. Defaults to DEFAULT_PROJECT. mem_list (List[str], optional): What list of inputs to compare. Defaults to DEFAULT_MEM_LIST.

Returns:
Tuple[pd.DataFrame, str]: The change table,

and the name of the new variable column.

Return type: Tuple[DataFrame, str]

src.wandb_utils.didnt_blow_up(rn)

Test if the run blew up. True if it didn’t blow up.

Parameters: rn (wandb.apis.public.Run) – run.
Returns: whether there was any blow up during the run.
Return type: bool

src.wandb_utils.find_missing(df_list, param=['c_d', 'eps_days', 'eps_frac', 'vary_cloud_const'])

Find which runs are missing from the project and print the commands to add them in.

Parameters

df_list (List[pd.DataFrame]) – list of dataframes by initial aggregation.
param (List[str], optional) – Parameters to compare. Defaults to PARAM.

Return type

None

src.wandb_utils.finished_names(project='sdat2/seager19')

Return all the finished run names.

Returns: list of run names.
Return type: List[str]

src.wandb_utils.fix_config(config)

Turn the config dict back into a DictConfig object.

Parameters: config (Union[dict, DictConfig]) – config dictionary.
Returns: original configuration.
Return type: DictConfig

src.wandb_utils.get_full_csv(project='sdat2/seager19')

Get the full csv.

Return type: DataFrame

src.wandb_utils.get_v(inp)

Get version of compilers.

Parameters

inp (str) – input string. E.g. “gfortran -v”.

Returns

selects the fortran version (or gcc version): part of the output.

Return type

str

src.wandb_utils.get_wandb_data(save_path=None)

Get wandb data (and save it?) now doesn’t redownload.

Parameters: save_path (Optional[str], optional) – Path to new csv file. Defaults to None. If it is None then doesn’t try to save.
Returns: The pandas dataframe of final results.
Return type: pd.DataFrame

src.wandb_utils.metric_conv_data(metric_name='mean_pac', prefix='cd_', ex_list=['cd_norm', 'nummode'], control_variable_list=[(('atm', 'k_days'), 10), (('atm', 'e_frac'), 2)], index_by=('coup', 'c_d'), project='sdat2/seager19')

Generate the data for the convergence of a particular item.

Used in src.visualisation.convergence.metric_conv_plot

Parameters: metric_name (str, optional) – Which keyword to use. Defaults to “mean_pac”.
Returns: metric_dict, setup_dict.
Return type: Tuple[dict, dict]

src.wandb_utils.output_fig_2_data(project='sdat2/seager19')

Output the figure 2 data for plotting.

Parameters: project (str, optional) – Wandb project to read. Defaults to DEFAULT_PROJECT.
Returns: The change table, and the name of the new variable column.
Return type: Tuple[List[pd.DataFrame], str]

src.wandb_utils.setup_from_config(cfg)

Gets the setup object for the archived run from the config.

Parameters: cfg (DictConfig) – Either the dictconfig or the dict.
Returns: The model setup object.
Return type: ModelSetup

src.wandb_utils.setup_from_name(name, project='sdat2/seager19')

Get the model setup from a name.

Parameters: name (str) – model name.
Returns: The model setup object.
Return type: ModelSetup

src.wandb_utils.start_wandb(cfg, unit_test=False)

Intialises wandb for run.

Weights and biases provides the run tracking for the model runs at different parameter settings.

TODO: Need to improve the ability to initialise wandb from a unit test.

Parameters

cfg (DictConfig) – The config settings to pass in to the wandb syncing.
unit_test (bool, optional) – Whether or not this is a unit-test. Defaults to False. If this is a unit test will currently not initialise wandb, but will call related functions.

Return type

None

src.wandb_utils.summary_table(project='sdat2/seager19')

Key input parameters, key output parameters, in a simple dataframe.

index=number

paramters=mem, ${ts}${clt}${sfcwind}${rh}${pr}${ps}${tau}, c_d, eps_frac, eps,

Key indexes: trend_nino3.4, mean_nino3.4, mean_pac

Parameters: project (str, optional) – Which weights and biases project to scan. Defaults to DEFAULT_PROJECT.
Returns: A dataframe.
Return type: pd.DataFrame

src.xr_utils module

Utilities around opening and processing netcdfs from this project.

src.xr_utils.can_coords(xr_obj)

Transform an object into having the canonical coordinates if possible.

Fail hard if impossible.

Parameters

xr_obj (Union[xr.Dataset, xr.DataArray]) – The dataset or datarray to canonicalise.

Returns

The dataset that has been canoncilised.: Function will raise an assertion error otherwise.

Return type

Union[xr.Dataset, xr.DataArray]

src.xr_utils.clip(da, pac=True, mask_land=True)

Clip a datarray to the pacific using sel, and mask the land.

Parameters

da (xr.DataArray) – The datarray to pass in.
pac (bool, optional) – Whether to focus on pacific. Defaults to True.
mask_land (bool, optional) – Whether to nan out land. Defaults to True.

Returns

da with those operations applied to it.

Return type

xr.DataArray

src.xr_utils.cut_and_taper(da, y_var='Y', x_var='X')

Cut and taper a field by latitude.

Since the atmosphere model dynamics are only applicable in the tropics, the computed wind stress anomaly is only applied to the ocean model between 20° S and 20° N, and is linearly tapered to zero at 25° S and 25° N.

Currently only copes if the array is two dimensional.

Parameters

da (xr.DataArray) – The datarray.
y_var (str, optional) – The name of the Y coordinate. Defaults to “Y”.
x_var (str, optional) – The name of the X coordinate. Defaults to “X”.

Returns

The datarray with the function applied.

Return type

xr.DataArray

Example

Should achieve:

if da.Y > 25 or da.Y < -25:
    da = 0.0
elif 20 <= da.Y <= 25:
    da = da - (0.2* (da.Y- 20))) * da
else -20 >= da.Y >= -25:
    da = da - (0.2* (-da.Y - 20)) * da

Usage:

from src.xr_utils import open_dataset, cut_and_taper
from src.constants import OCEAN_DATA_PATH
da_new: xr.DataArray = open_dataset(OCEAN_DATA_PATH / "qflx.nc").qflx
cut_and_taper(da_new.isel(Z=0, T=0, variable=0))

src.xr_utils.fix_calendar(xr_in, timevar='T')

Fix and decode the calendar.

Parameters

xr_in (Union[xr.Dataset, xr.DataArray]) – the xarray object input
timevar (str, optional) – The time variable name. Defaults to “T”.

Returns

same type and xr_in with fixed calendar.

Return type

Union[xr.Dataset, xr.DataArray]

src.xr_utils.get_clim(xr_da)

Get the climateology of an xr.DataArray.

Parameters: xr_da (xr.DataArray) – The input datarray. Assumes that the time coordinate is canonical “T”.
Returns: The climatology for the time period.
Return type: xr.DataArray

src.xr_utils.get_trend(da, min_clim_f=False, output='rise', t_var='T', make_hatch_mask=False, keep_ds=False, uncertainty=False)

Returns either the linear trend rise, or the linear trend slope, possibly with the array to hatch out where the trend is not significant.

Uses xr.polyfit order 1 to do everything.

Parameters

da (xr.DataArray) – the timeseries.
min_clim_f (bool, optional) – whether to calculate and remove the climateology. Defaults to false.
output (Literal[, optional) – What to return. Defaults to “rise”.
t_var (str, optional) – The time variable name. Defaults to “T”. Could be changed to another variable that you want to fit along.
make_hatch_mask (bool, optional) – Whether or not to also return a DataArray of boolean values to indicate where is not significant. Defaults to False. Will only work if you’re passing in an xarray object.
uncertainty (bool, optional) – Whether to return a ufloat object if doing linear regression on a single timeseries. Defaults to false.

Returns

The rise/slope over the time period, possibly with the hatch array if that opition is selected for a grid.

Return type

Union[float, ufloat, xr.DataArray, Tuple[xr.DataArray, xr.DataArray]]

src.xr_utils.min_clim(xr_da, clim=None)

Take away the climatology from an xr.DataArray.

Parameters

xr_da (xr.DataArray) – The xarray input. Canonical coords.
clim (Optional[xr.DataArray], optional) – The climateology. Defaults to None, which will remake climatology.

Returns

The anomaly.

Return type

xr.DataArray

src.xr_utils.open_dataarray(path)

Open an xarray dataarray and format it.

Will automatically try to make the dataset Coordinates: into the canonical coordinate names (using can_coords).

Will also decode the time axis.

TODO: add option for opening of datarrayys that just ensures they open, rather than changing their atrributes.

Parameters: path (Union[str, pathlib.Path]) – the path to the netcdf datarray file.
Returns: The formatted datarray.
Return type: xr.DataArray

src.xr_utils.open_dataset(path, use_can_coords=False)

Open a dataset and formats it.

Will only work if there is only one set of each coordinate at the moment.

Parameters

path (Union[str, pathlib.Path]) – the path to the netcdf dataset file.
can_coords (bool) – whether or not to try and make the coordinate into the canonical names.

Returns

The formatted dataset. Will have time variables decoded.

Return type

xr.Dataset

src.xr_utils.rem_var(da1)

remove the ‘variable’ from a dataarray

Return type: DataArray

src.xr_utils.sel(xr_obj, reg='pac')

Select a region of the dataset or datarray.

Assumes

reg options: “pac”, “nino1+2”, “nino3”, “nino3.4”, “nino3” https://www.ncdc.noaa.gov/teleconnections/enso/indicators/sst/

From Figure 1: Distribution of 60-year trends in the NINO3.4 SST index (SST averaged over 5° S−5° N and 170° W−120° W) for end dates from 2008–2017.

Parameters

xr_obj (Union[xr.Dataset, xr.DataArray]) – The xarray object. Needs to have canonical coordinates.
reg (str, optional) – The keyword region to select. Defaults to ‘pac’.

Returns

The downsized xarray object.

Return type

Union[xr.Dataset, xr.DataArray]

Example

Effect example:

if reg == "pac":
    return xr_obj.sel(X=slice(100, 290), Y=slice(-30, 30))
elif reg == "nino3.4":
    return xr_obj.sel(X=slice(190, 240), Y=slice(-5, 5))
elif reg == "nino4":
    return xr_obj.sel(X=slice(160, 210), Y=slice(-5, 5))
elif reg == "nino3":
    return xr_obj.sel(X=slice(210, 270), Y=slice(-5, 5))
elif reg == "nino1+2":
    return xr_obj.sel(X=slice(270, 280), Y=slice(-10, 0))

src.xr_utils.spatial_mean(da)

Average a datarray over “X” and “Y” coordinates.

Spatially weighted.

Originally from: https://ncar.github.io/PySpark4Climate/tutorials/Oceanic-Ni%C3%B1o-Index/ (although their version is wrong as it assumes numpy input is degrees)

https://numpy.org/doc/stable/reference/generated/numpy.cos.html https://numpy.org/doc/stable/reference/generated/numpy.radians.html

The average should behave like:

\begin{equation} \bar{T}_{\text {lat }}=\frac{1}{n \text { Lon }} \sum_{i=1}^{n \text{Lon}} T_{\text \text{lon}, i} \end{equation} \begin{equation} \bar{T}_{\text {month }}=\frac{\sum_{j=1}^{n L a t} \cos \left(\text { lat }_{j}\right) \bar{T}_{\text {lat }, j}}{\sum_{j=1}^{\text {n \text{Lat} }} \cos \left(\text { lat }_{j}\right)} \end{equation}

Parameters: da (xr.DataArray) – da to average.
Returns: avarage of da.
Return type: xr.DataArray

src package

Subpackages

Submodules

src.constants module

src.main module

src.metrics module

src.plot_utils module

src.search module

src.utils module

src.wandb_utils module

src.xr_utils module

Module contents