SESAME

SESAME is an open-source Python tool designed to make spatial data analysis, visualization, and exploration accessible to all.
Whether you’re a researcher, student, or enthusiast, SESAME helps you unlock insights from geospatial data with just a few lines of code.

What can you do with the SESAME toolbox?

Conveniently process and analyze both spatial datasets (e.g. GeoTIFFs) and tabular jurisdictional data (e.g. csv files by country) through a unified set of tools.
Generate standardized netcdf files from a wide range of spatial input types (e.g. lines, points, polygons)
Create publication-ready maps and plots.
Explore spatial and temporal patterns among hundreds of variables in the Human-Earth Atlas.

Getting Started with the Human-Earth Atlas:

Install SESAME*
Download the Human-Earth Atlas (Figshare Link)
Load your spatial data (e.g., land cover, population, climate)
Use SESAME’s plotting tools to visualize and compare datasets
Explore spatial and temporal patterns among hundreds of variables in the Human-Earth Atlas.

*Note: SESAME may take up to 2 minutes to load when used for the first time. This will not recur with further use.

Navigating the Atlas:

List the netCDF files in the Human–Earth Atlas

import sesame as ssm
ssm.atlas(directory=atlas)

Human-Earth Atlas

View dataset metadata

ssm.list_variables("atlas/B.land.cover.2001-2023.a.nc")

NetCDF Info

Visualize data on the map

# Load data
netcdf_file = "atlas/T.transportation.roads.nc"
ssm.plot_map(dataset=netcdf_file,variable="roads_gross", color='magma_r', title='Gross Road Mass', label='g m-2', vmin=0, vmax=1e4, extend_max=True)

Gross Road Mass Map

Quick mathematical operation

# Load data
netcdf_file = "atlas/T.transportation.roads.nc"
# Perform the operation
ssm.divide_variables(dataset=netcdf_file, variable1="road_length", variable2="grid_area", new_variable_name="road_density")

Ready to get started? Dive into the function docs below or read The SESAME Human-Earth Atlas paper for inspiration!

def point_2_grid( point_data, variable_name='variable', long_name='variable', units='value/grid-cell', source=None, time=None, resolution=1, agg_column=None, agg_function='sum', attr_field=None, output_directory=None, output_filename=None, normalize_by_area=False, zero_is_value=False, verbose=False):

Converts point data from a shapefile or GeoDataFrame into a gridded netCDF dataset.

Parameters

point_data : GeoDataFrame or str. Input point data to be gridded. Can be either a GeoDataFrame or a path to a point shapefile (.shp).
variable_name : str, optional. Name of the variable to include in the netCDF attributes metadata. Defaults to:
- The unique entries in the attr_field column if specified.
- The input filename without extension if attr_field and variable_name are not specified.
long_name : str, optional. A descriptive name for the variable, added to the netCDF metadata. Behaves the same as variable_name if attr_field is specified. Defaults to the input filename without extension if unspecified.
units : str, optional. Units of the data variable to include in the netCDF metadata. Default is "value/grid-cell".
source : str, optional. String describing the original source of the input data. This will be added to the netCDF metadata.
time : str, optional. Time dimension for the output netCDF. If specified, the output will include a time dimension with the value provided. Default is None (spatial, 2D netCDF output).
resolution : float, optional. Desired resolution for the grid cells in the output dataset. Default is 1 degree.
agg_column : str, optional. Column name in the shapefile or GeoDataFrame specifying the values to aggregate in each grid cell. Defaults to counting the number of points per grid cell.
agg_function : str, optional. Aggregation method for combining values in each grid cell. Options include:
- 'sum' (default): Sums all point values.
- 'max': Takes the maximum value.
- 'min': Takes the minimum value.
- 'std': Computes the standard deviation.
attr_field : str, optional. Column name in the shapefile or GeoDataFrame specifying the variable names for multiple data types.
output_directory : str, optional. Directory where the output NetCDF file will be saved. If None, but output_filename is True, the file will be saved in the current working directory.
output_filename : str, optional. Name of the output NetCDF file (without the .nc extension). If not provided:
- Uses the input shapefile name if a shapefile path is given.
- Saves as "gridded_points.nc" if a GeoDataFrame is provided as input.
normalize_by_area : bool, optional. If True, normalizes the grid values by area (e.g., converts to value per square meter). Default is False.
zero_is_value : bool, optional. If True, treats zero values as valid data rather than as no-data. Default is False.
verbose : bool, optional. If True, prints information about the process, such as global sum of values before and after gridding. Default is False.

Returns

xarray.Dataset. Transformed dataset with gridded data derived from the input point data.

Notes

The function supports input in the form of a shapefile or GeoDataFrame containing point data.
If points lie exactly on a grid boundary, they are shifted by 0.0001 degrees in both latitude and longitude to ensure assignment to a grid cell.
The function creates a netCDF file, where data variables are aggregated based on the agg_column and agg_function.

Example

>>> point_2_grid(point_data=shapefile_path, 
...             variable_name="airplanes", 
...             long_name="Airplanes Count", 
...             units="airport/grid-cell", 
...             source="CIA", 
...             resolution=1,
...             verbose=True
... )

def line_2_grid( line_data, variable_name='variable', long_name='variable', units='meter/grid-cell', source=None, time=None, resolution=1, agg_column=None, agg_function='sum', attr_field=None, output_directory=None, output_filename=None, normalize_by_area=False, zero_is_value=False, verbose=False):

Converts line data from a shapefile or GeoDataFrame into a gridded netCDF dataset.

Parameters

line_data : GeoDataFrame or str. Input lines data to be gridded. Can be either a GeoDataFrame or a path to a line/polyline shapefile (.shp).
variable_name : str, optional. Name of the variable to include in the netCDF attributes metadata. Defaults to:
- The unique entries in the attr_field column if specified.
- The input filename without extension if attr_field and variable_name are not specified.
long_name : str, optional. A descriptive name for the variable, added to the netCDF metadata. Behaves the same as variable_name if attr_field is specified. Defaults to the input filename without extension if unspecified.
units : str, optional. Units of the data variable to include in the netCDF metadata. Default is "meter/grid-cell".
source : str, optional. String describing the original source of the input data. This will be added to the netCDF metadata.
time : str, optional. Time dimension for the output netCDF. If specified, the output will include a time dimension with the value provided. Default is None (spatial, 2D netCDF output).
resolution : float, optional. Desired resolution for the grid cells in the output dataset. Default is 1 degree.
agg_column : str, optional. Column name in the shapefile or GeoDataFrame specifying the values to aggregate in each grid cell. Defaults to summing the lengths of intersected lines per grid cell.
agg_function : str, optional. Aggregation method for combining values in each grid cell. Options include:
- 'sum' (default): Sums all line values.
- 'max': Takes the maximum value.
- 'min': Takes the minimum value.
- 'std': Computes the standard deviation.
attr_field : str, optional. Column name in the shapefile or GeoDataFrame specifying the variable names for multiple data types.
output_directory : str, optional. Directory where the output NetCDF file will be saved. If None, but output_filename is True, the file will be saved in the current working directory.
output_filename : str, optional. Name of the output NetCDF file (without the .nc extension). If not provided:
- Uses the input shapefile name if a shapefile path is given.
- Saves as "gridded_lines.nc" if a GeoDataFrame is provided as input.
normalize_by_area : bool, optional. If True, normalizes the variable in each grid cell by the area of the grid cell (e.g., converts to value per square meter). Default is False.
zero_is_value : bool, optional. If True, treats zero values as valid data rather than as no-data. Default is False. If True, treats zero values as valid data rather than as no-data. Default is False.
verbose : bool, optional. If True, prints information about the process, such as global sum of values before and after gridding. Default is False.

Returns

xarray.Dataset. Transformed dataset with gridded data derived from the input line data.

Notes

The function supports input in the form of a shapefile or GeoDataFrame containing line data.
Line lengths are calculated and aggregated based on the specified agg_column and agg_function.
If lines intersect a grid boundary, their contributions are divided proportionally among the intersected grid cells.
The function creates a netCDF file, where data variables are aggregated and stored with metadata.

Example

>>> line_2_grid(line_data=shapefile_path, 
...             variable_name="roads", 
...             long_name="Roads Length", 
...             units="meter/grid-cell", 
...             source="OpenStreetMap",  
...             resolution=1,
...             agg_function="sum", 
...             verbose=True)
... )

def poly_2_grid( polygon_data, variable_name='variable', long_name='variable', units='m2/grid-cell', source=None, time=None, resolution=1, attr_field=None, fraction=False, agg_function='sum', output_directory=None, output_filename=None, normalize_by_area=False, zero_is_value=False, verbose=False):

Converts polygon data from a shapefile or GeoDataFrame into a gridded netCDF dataset.

Parameters

polygon_data : GeoDataFrame or str. Input polygons data to be gridded. Can be either a GeoDataFrame or a path to a polygons shapefile (.shp).
variable_name : str, optional. Name of the variable to include in the netCDF attributes metadata. Defaults to:
- The unique entries in the attr_field column if specified.
- The input filename without extension if attr_field and variable_name are not specified.
long_name : str, optional. A descriptive name for the variable, added to the netCDF metadata. Behaves the same as variable_name if attr_field is specified. Defaults to the input filename without extension if unspecified.
units : str, optional. Units of the data variable to include in the netCDF metadata. Default is "m2/grid-cell".
source : str, optional. String describing the original source of the input data. This will be added to the netCDF metadata.
time : str, optional. Time dimension for the output netCDF. If specified, the output will include a time dimension with the value provided. Default is None (spatial, 2D netCDF output).
resolution : float, optional. Desired resolution for the grid cells in the output dataset. Default is 1 degree.
attr_field : str, optional. Column name in the shapefile or GeoDataFrame specifying the variable names for multiple data types.
fraction : bool, optional. If True, calculates the fraction of each polygon within each grid cell. The output values will range from 0 to 1. Default is False.
agg_function : str, optional. Aggregation method for combining values in each grid cell. Default is 'sum'. Options include:
- 'sum': Sum of values.
- 'max': Maximum value.
- 'min': Minimum value.
- 'std': Standard deviation.
output_directory : str, optional. Directory where the output NetCDF file will be saved. If None, but output_filename is True, the file will be saved in the current working directory.
output_filename : str, optional. Name of the output NetCDF file (without the .nc extension). If not provided:
- Uses the input shapefile name if a shapefile path is given.
- Saves as "gridded_polygons.nc" if a GeoDataFrame is provided as input.
normalize_by_area : bool, optional. If True, normalizes the grid values by area (e.g., converts to value per square meter). Default is False.
zero_is_value : bool, optional. If True, treats zero values as valid data rather than as no-data. Default is False.
verbose : bool, optional. If True, prints information about the process, such as global sum of values before and after gridding. Default is False.

Returns

xarray.Dataset. Transformed dataset with gridded data derived from the input polygon data.

Notes

The function supports input in the form of a shapefile or GeoDataFrame containing polygon data.
Polygon areas are calculated and aggregated based on the specified attr_field and agg_function.
If the fraction parameter is True, the fraction of each polygon in each grid cell will be computed, with values ranging from 0 to 1.
The function creates a netCDF file, where data variables are aggregated and stored with metadata.

Example

>>> poly_2_grid(polygon_data=shapefile_path, 
...             units="fraction", 
...             source="The new global lithological map database GLiM", 
...             resolution=1, 
...             attr_field="Short_Name", 
...             fraction="yes", 
...             verbose=True
... )

def grid_2_grid( raster_data, agg_function, variable_name, long_name, units='value/grid-cell', source=None, time=None, resolution=1, netcdf_variable=None, output_directory=None, output_filename=None, padding='symmetric', zero_is_value=False, normalize_by_area=False, verbose=False):

Converts raster data (TIFF or netCDF) into a re-gridded xarray dataset.

Parameters

raster_data : str. Path to the input raster data file. This can be a string path to a TIFF (.tif) file, a string path to a NetCDF (.nc or .nc4) file or An already loaded xarray.Dataset object.
- If raster_data is a NetCDF file or an xarray.Dataset, the netcdf_variable parameter must also be provided to specify which variable to extract.
agg_function : str. Aggregation method to apply when re-gridding. Supported values are 'SUM', 'MEAN', or 'MAX'.
variable_name : str. Name of the variable to include in the output dataset.
long_name : str. Descriptive name for the variable.
units : str, optional. Units for the variable. Default is "value/grid-cell".
source : str, optional. Source information for the dataset. Default is None.
time : str or None, optional. Time stamp or identifier for the data. Default is None.
resolution : int or float, optional. Desired resolution of the grid cells in degree in the output dataset. Default is 1.
netcdf_variable : str, optional. Name of the variable to extract from the netCDF file, if applicable. Required for netCDF inputs.
output_directory : str, optional. Directory where the output NetCDF file will be saved. If None, but output_filename is True, the file will be saved in the current working directory.
output_filename : str, optional. Name of the output NetCDF file (without the .nc extension). If not provided:
- Uses variable_name if it is specified.
- Defaults to regridded.nc if none of the above are provided.
padding : str, optional. Padding strategy ('symmetric' or 'end').
zero_is_value : bool, optional. Whether to treat zero values as valid data rather than as no-data. Default is False.
normalize_by_area : bool, optional. Whether to normalize grid values by area (e.g., convert to value per square meter). Default is False.
verbose : bool, optional. If True, prints the global sum of values before and after re-gridding. Default is False.

Returns

xarray.Dataset. Re-gridded xarray dataset containing the processed raster data.

Notes

This function supports raster data in TIFF or netCDF format and performs re-gridding based on the specified agg_function. The output dataset will include metadata such as the variable name, long name, units, and optional source and time information.

Example

>>> grid_2_grid(raster_path=pop_path, 
...             agg_function="sum", 
...             variable_name="population_count", 
...             long_name="Total Population", 
...             units="people per grid", 
...             source="WorldPop", 
...             resolution=1, 
...             time="2020-01-01", 
...             verbose="yes"
... )

def table_2_grid( surrogate_data, surrogate_variable, tabular_data, tabular_column, variable_name=None, long_name=None, units='value/grid-cell', source=None, time=None, output_directory=None, output_filename=None, zero_is_value=False, normalize_by_area=False, eez=False, verbose=False):

Convert tabular data to a gridded dataset by spatially distributing values based on a NetCDF variable and a tabular column.

Parameters:

surrogate_data : xarray.Dataset or str. xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset. The dataset must include the variable specified in surrogate_variable.
surrogate_variable : str. Variable name in the NetCDF or xarray dataset used for spatial distribution.
tabular_data : pandas.DataFrame or str. Tabular dataset as a pandas DataFrame or a path to a CSV file. If a file path is provided, it will be automatically loaded into a DataFrame. The data must include a column named "ISO3" representing country codes. If not present, use the add_iso3_column utility function to convert country names to ISO3 codes.
tabular_column : str. Column name in the tabular dataset with values to be spatially distributed.
variable_name : str, optional. Name of the variable. Default is None.
long_name : str, optional. A long name for the variable. Default is None.
units : str, optional. Units of the variable. Default is 'value/grid'.
source : str, optional. Source information, if available. Default is None.
time : str, optional. Time information for the dataset.
output_directory : str, optional. Directory where the output NetCDF file will be saved. If None, but output_filename is True, the file will be saved in the current working directory.
output_filename : str, optional. Name of the output NetCDF file (without the .nc extension). If not provided:
- Uses variable_name if it is specified.
- Falls back to long_name or tabular_column if variable_name is not given.
- Defaults to gridded_table.nc if none of the above are provided.
zero_is_value: bool, optional. If the value is True, then the function will treat zero as an existent value and 0 values will be considered while calculating mean and STD.
normalize_by_area : bool, optional. Whether to normalize grid values by area (e.g., convert to value per square meter). Default is False.
eez : bool, optional. If set to True, the function converts the jurisdictional Exclusive Economic Zone (EEZ) values to a spatial grid.
verbose: bool, optional. If True, the global gridded sum of before and after re-gridding operation will be printed. If any jurisdiction where surrogate variable is missing and tabular data is evenly distributed over the jurisdiction, the ISO3 codes of evenly distributed countries will also be printed.

Returns:

xarray.Dataset. Resulting gridded dataset after spatial distribution of tabular values.

Example

>>> table_2_grid(surrogate_data=netcdf_file_path, 
...             surrogate_variable="railway_length", 
...             tabular_data=csv_file_path, 
...             tabular_column="steel", 
...             variable_name="railtract_steel", 
...             long_name="'Railtrack Steel Mass'", 
...             units="g m-2", 
...             source="Matitia (2022)", 
...             normalize_by_area="yes", 
...             verbose="yes"
... )

def grid_2_table( grid_data, variables=None, time=None, grid_area=None, resolution=1, aggregation=None, agg_function='sum', verbose=False):

Process gridded data from an xarray Dataset to generate tabular data for different jurisdictions.

Parameters:

grid_data : xarray.Dataset or str. xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variables : str, optional. Variables name to be processed. It can either be one variable or list of variables. If None, all variables in the dataset (excluding predefined ones) will be considered.
time : str, optional. Time slice for data processing. If provided, the nearest time slice is selected. If None, a default time slice is used.
resolution : float, optional. Resolution of gridded data in degree. Default is 1 degree.
grid_area : str, optional. Indicator to consider grid area during processing. If 'YES', the variable is multiplied by grid area.
aggregation : str, optional. Aggregation level for tabular data. If 'continent', the data will be aggregated at the continent level.
agg_function : str, optional, default 'sum'. Aggregation method. Options: 'sum', 'mean', 'max', 'min', 'std'.
verbose : bool, optional. If True, the function will print the global sum of values before and after aggregation.

Returns:

df : pandas DataFrame. Tabular data for different jurisdictions, including ISO3 codes, variable values, and optional 'Year' column.

def add_iso3_column(df, column):

Convert country names in a DataFrame column to their corresponding ISO3 country codes.

This function reads a JSON file containing country names and their corresponding ISO3 codes, then maps the values from the specified column in the DataFrame to their ISO3 codes based on the JSON data. The resulting ISO3 codes are added as a new column named 'ISO3'.

Parameters

df (pandas.DataFrame): The DataFrame containing a column with country names.
column (str): The name of the column in the DataFrame that contains country names.

Returns

pandas.DataFrame: The original DataFrame with an additional 'ISO3' column containing the ISO3 country codes.

Raises:

FileNotFoundError: If the JSON file containing country mappings cannot be found.
KeyError: If the specified column is not present in the DataFrame.

Example

>>> add_iso3_column(df=dataframe, 
...                column="Country"
... )

def plot_histogram( dataset, variable, time=None, bin_size=30, color='blue', plot_title=None, x_label=None, remove_outliers=False, log_transform=None, output_dir=None, filename=None):

Create a histogram for an array variable in an xarray dataset. Optionally remove outliers and apply log transformations.

Parameters:

dataset : xarray.Dataset or str, xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variable: str, the name of the variable to plot.
time: str, optional, the time slice to plot.
bin_size: int, optional, the number of bins in the histogram.
color: str, optional, the color of the histogram bars.
plot_title: str, optional, the title for the plot.
x_label: str, optional, the label for the x-axis.
remove_outliers: bool, optional, whether to remove outliers.
log_transform: str, optional, the type of log transformation ('log10', 'log', 'log2').
output_dir : str, optional, Directory path to save the output figure. If not provided, the figure is saved in the current working directory.
filename : str, optional, Filename (with extension) for saving the figure. If not provided, the plot is saved as "output_histogram.png".

Returns:

None, displays the plot and optionally saves it to a file.

Example

>>> plot_histogram(dataset=dataset, 
...                variable="railway_length", 
...                bin_size=30, 
...                color='blue', 
...                plot_title="Histogram of Railway Length"
... )

def plot_scatter( dataset, variable1, variable2, dataset2=None, time=None, color='blue', x_label=None, y_label=None, plot_title=None, remove_outliers=False, log_transform_1=None, log_transform_2=None, equation=False, output_dir=None, filename=None):

Create a scatter plot for two variables in an xarray dataset. Optionally remove outliers and apply log transformations.

Parameters:

variable1 : str, name of the variable to be plotted on the x-axis. Must be present in dataset.
variable2 : str, name of the variable to be plotted on the y-axis. If dataset2 is provided, this variable will be extracted from dataset2; otherwise, it must exist in dataset.
dataset : xarray.Dataset or str, the primary dataset or a path to a NetCDF file. This dataset must contain the variable specified by variable1, which will be used for the x-axis.
dataset2 : xarray.Dataset or str, optional, a second dataset or a path to a NetCDF file containing the variable specified by variable2 (for the y-axis). If not provided, dataset will be used for both variables.
time: str, optional, the time slice to plot.
color: str, optional, the color map of the scatter plot.
x_label: str, optional, the label for the x-axis.
y_label: str, optional, the label for the y-axis.
plot_title: str, optional, the title for the plot.
remove_outliers: bool, optional, whether to remove outliers from the data.
log_transform_1: str, optional, the type of log transformation for variable1 ('log10', 'log', 'log2').
log_transform_2: str, optional, the type of log transformation for variable2 ('log10', 'log', 'log2').
equation : bool, optional, ff True, fits and displays a linear regression equation.
output_dir : str, optional, Directory path to save the output figure. If not provided, the figure is saved in the current working directory.
filename : str, optional, Filename (with extension) for saving the figure. If not provided, the plot is saved as "output_scatter.png".

Returns:

None, displays the plot and optionally saves it to a file.

Example

>>> plot_scatter(dataset=ds_road, 
...             variable1="roads_gross", 
...             variable2="buildings_gross", 
...             dataset2=ds_build, 
...             color='blue',
...             plot_title="Building vs Road", 
...             remove_outliers=True, 
...             log_transform_1="log10", 
...             log_transform_2="log10"
... )

def plot_time_series( dataset, variable, agg_function='sum', plot_type='both', color='blue', plot_label='Area Plot', x_label='Year', y_label='Value', plot_title='Time Series Plot', smoothing_window=None, output_dir=None, filename=None):

Create a line plot and/or area plot for a time series data variable.

Parameters:

dataset : xarray.Dataset or str, xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variable: str, the name of the variable to plot.
agg_function: str, the operation to apply ('sum', 'mean', 'max', 'std').
plot_type: str, optional, the type of plot ('line', 'area', 'both'). Default is 'both'.
color: str, optional, the color of the plot. Default is 'blue'.
plot_label: str, optional, the label for the plot. Default is 'Area Plot'.
x_label: str, optional, the label for the x-axis. Default is 'Year'.
y_label: str, optional, the label for the y-axis. Default is 'Value'.
plot_title: str, optional, the title of the plot. Default is 'Time Series Plot'.
smoothing_window: int, optional, the window size for rolling mean smoothing.
output_dir : str, optional, Directory path to save the output figure. If not provided, the figure is saved in the current working directory.
filename : str, optional, Filename (with extension) for saving the figure. If not provided, the plot is saved as "output_time_series.png".

Returns:

None, displays the plot and optionally saves it to a file.

Example

>>> plot_time_series(variable="buildings_gross", 
...                dataset=ds_build, 
...                agg_function='sum', 
...                plot_type='both', 
...                color='blue', 
...                x_label='Year', 
...                y_label='Value', 
...                plot_title='Time Series Plot'
... )

def plot_hexbin( dataset, variable1, variable2, dataset2=None, time=None, color='pink_r', grid_size=30, x_label=None, y_label=None, plot_title=None, remove_outliers=False, log_transform_1=None, log_transform_2=None, output_dir=None, filename=None):

Create a hexbin plot for two variables in an xarray dataset.

Parameters:

dataset : xarray.Dataset or str, the primary dataset or a path to a NetCDF file. This dataset must contain the variable specified by variable1, which will be used for the x-axis.
variable1 : str, name of the variable to be plotted on the x-axis. Must be present in dataset.
variable2 : str, name of the variable to be plotted on the y-axis. If dataset2 is provided, this variable will be extracted from dataset2; otherwise, it must exist in dataset.
dataset2 : xarray.Dataset or str, optional, a second dataset or a path to a NetCDF file containing the variable specified by variable2 (for the y-axis). If not provided, dataset will be used for both variables.
time: str, optional, the time slice to plot.
color: str, optional, the color map of the hexbin plot.
grid_size: int, optional, the number of hexagons in the x-direction.
x_label: str, optional, the label for the x-axis.
y_label: str, optional, the label for the y-axis.
plot_title: str, optional, the title for the plot.
remove_outliers: bool, optional, whether to remove outliers from the data.
log_transform_1: str, optional, the type of log transformation for variable1 ('log10', 'log', 'log2').
log_transform_2: str, optional, the type of log transformation for variable2 ('log10', 'log', 'log2').
output_dir : str, optional, Directory path to save the output figure. If not provided, the figure is saved in the current working directory.
filename : str, optional, Filename (with extension) for saving the figure. If not provided, the plot is saved as "output_hexbin.png".

Returns:

None, displays the map and optionally saves it to a file.

Example

>>> plot_hexbin(dataset=ds_road, 
...             variable1="roads_gross", 
...             variable2="buildings_gross", 
...             dataset2=ds_build, 
...             color='blue', 
...             plot_title="Building vs Road"
... )

def plot_map( dataset, variable, time=None, color='hot_r', title='', label='', vmin=None, vmax=None, extend_min=False, extend_max=False, levels=10, out_bound=True, remove_ata=False, output_dir=None, filename=None, show=True):

Plots a 2D map of a variable from an xarray Dataset or NetCDF file with customizable colorbar, projection, and map appearance.

Parameters

dataset : xarray.Dataset. or str. xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variable : str. Name of the variable in the xarray Dataset to plot.
color : str, default 'hot_r'. Matplotlib colormap name for the plot (discrete color scale).
title : str, default ''. Title of the map.
label : str, default ''. Label for the colorbar.
time: str, optional, the time slice to plot.
vmin : float, optional. Minimum data value for the colorbar range. If not provided, the minimum of the variable is used.
vmax : float, optional. Maximum data value for the colorbar range. If not provided, the maximum of the variable is used.
extend_min : bool, default False. If True, includes values below vmin in the first color class and shows a left arrow on the colorbar.
extend_max : bool, default False. If True, includes values above vmax in the last color class and shows a right arrow on the colorbar.
levels : int or list of float, default 10. Either the number of color intervals or a list of explicit interval boundaries.
out_bound : bool, default True. Whether to display the outer boundary (spine) of the map projection.
remove_ata : bool, default False. If True, removes Antarctica from the map by excluding data below 60°S latitude.
output_dir : str, optional. Directory path to save the output figure. If not provided, the figure is saved in the current working directory.
filename : str, optional. Filename (with extension) for saving the figure. If not provided, the plot is saved as "output_plot.png".
show : bool, True. Whether or not show the map

Notes

If both extend_min and extend_max are False, the dataset is clipped strictly within [vmin, vmax].
The colorbar will use arrows to indicate out-of-bound values only if extend_min or extend_max is True.
Tick formatting on the colorbar is:
- Two decimal places if (vmax - vmin) <= 10.
If remove_ata is True, the colorbar is placed slightly higher to avoid overlap with the map.

Returns:

Axes class of the map, optionally displays the map and saves it to a file.

Example

>>> plot_map(
...     dataset=ds.isel(time=-1),
...     variable='npp',
...     vmin=0,
...     vmax=1200,
...     extend_max=True,
...     color='Greens',
...     levels=10,
...     remove_ata=True,
...     title='Net Primary Productivity',
...     label='gC/m²/year',
...     filename='npp_map.png'
... )

def plot_country( tabular_data, column, title='', label='', color='viridis', levels=10, output_dir=None, filename=None, remove_ata=False, out_bound=True, vmin=None, vmax=None, extend_min=False, extend_max=False):

Plots a choropleth map of countries using a specified data column and a world shapefile.

Parameters:

tabular_data : pandas.DataFrame or str. Input table containing country-level data. Can be either:
- A pandas DataFrame with the required column
- A string path to a CSV file, which will be automatically read into a DataFrame
column : str. Name of the column in the dataframe to visualize.
title : str, optional. Title of the map. Default is an empty string.
label : str, optional. Label for the colorbar. Default is an empty string.
color : str, optional. Name of the matplotlib colormap to use. Default is 'viridis'.
levels : int or list of float, optional. Number of color levels (if int) or list of bin edges (if list). Default is 10.
remove_ata : bool, optional. Whether to remove Antarctica ('ATA') from the data. Default is False.
out_bound : bool, optional. Whether to display map boundaries (spines). Default is True.
vmin : float or None, optional. Minimum value for the colormap. If None, calculated from the data.
vmax : float or None, optional. Maximum value for the colormap. If None, calculated from the data.
extend_min : bool, optional. Whether to extend the colorbar below vmin. Default is False.
extend_max : bool, optional. Whether to extend the colorbar above vmax. Default is False.
output_dir : str, optional. Directory path to save the output figure. If not provided, the figure is saved in the current working directory.
filename : str, optional. Filename (with extension) for saving the figure. If not provided, the plot is saved as "output_country_plot.png".

Returns:

None, displays the map and optionally saves it to a file.

Example

>>> plot_country(tabular_data="country_data.csv", 
...             column="population", 
...             title="Population of Countries", 
...             label="Population", 
...             color='viridis'
... )

def sum_variables(dataset, variables=None, new_variable_name=None, time=None):

Sum specified variables in the xarray dataset. If no variables are specified, sum all variables except those starting with 'grid_area'. Fill NaNs with zero before summing, and convert resulting zeros back to NaNs.

Parameters:

dataset: xarray.Dataset. or str, xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variables: list of str, the names of the variables to sum. If None, sum all variables except those starting with 'grid_area' and 'land_frac'.
new_variable_name: str, optional, the name of the new variable to store the sum.
time: optional, a specific time slice to select from the dataset.

Returns:

xarray.Dataset. with the summed variable.

Example

>>> sum_variables(dataset=ds, 
...              variables=["roads_gross", "buildings_gross"], 
...              new_variable_name="gross_mass"
... )

def subtract_variables(dataset, variable1, variable2, new_variable_name=None, time=None):

Subtract one variable from another in the xarray dataset. Fill NaNs with zero before subtracting, and convert resulting zeros back to NaNs.

Parameters:

dataset: xarray.Dataset. or str, xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variable1: str, the name of the variable to subtract from.
variable2: str, the name of the variable to subtract.
new_variable_name: str, optional, the name of the new variable to store the result.
time: optional, a specific time slice to select from the dataset.

Returns:

xarray.Dataset. with the resulting variable.

Example

>>> subtract_variables(dataset=ds,
...                   variable1="precipitation", 
...                   variable2="evaporation", 
...                   new_variable_name="net_water_gain"
... )

def divide_variables(dataset, variable1, variable2, new_variable_name=None, time=None):

Divide one variable by another in the xarray dataset. Fill NaNs with zero before dividing, and convert resulting zeros back to NaNs.

Parameters:

dataset: xarray.Dataset. or str, xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variable1: str, the name of the variable to be divided (numerator).
variable2: str, the name of the variable to divide by (denominator).
new_variable_name: str, optional, the name of the new variable to store the result.
time: optional, a specific time slice to select from the dataset.

Returns:

xarray.Dataset. with the resulting variable.

Example

>>> divide_variables(dataset=ds,
...                  variable1="road_length", 
...                  variable2="grid_area", 
...                  new_variable_name="road_density"
... )

def multiply_variables(dataset, variables=None, new_variable_name=None, time=None):

Multiply specified variables in the xarray dataset. If no variables are specified, multiply all variables. Fill NaNs with one before multiplying, and convert resulting ones back to NaNs.

Parameters:

dataset: xarray.Dataset. or str, xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variables: list of str, the names of the variables to multiply. If None, multiply all variables, excluding the "grid_area" and "land_frac" variables included in the dataset.
new_variable_name: str, optional, the name of the new variable to store the product.
time: optional, a specific time slice to select from the dataset.

Returns:

xarray.Dataset. with the resulting variable.

Example

>>> multiply_variables(
...     dataset=ds,
...     variables=["crop_area", "yield_per_hectare"],
...     new_variable_name="total_crop_yield"
... )

def average_variables(dataset, variables=None, new_variable_name=None, time=None):

Average specified variables in the xarray dataset. If no variables are specified, average all variables except those starting with 'grid_area'. Fill NaNs with zero before averaging, and convert resulting zeros back to NaNs.

Parameters:

dataset: xarray.Dataset. or str, xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variables: list of str, the names of the variables to average. If None, average all variables except those starting with 'grid_area' and 'land_frac'.
new_variable_name: str, optional, the name of the new variable to store the average.
time: optional, a specific time slice to select from the dataset.

Returns:

xarray.Dataset. with the averaged variable.

Example

>>> average_variables(dataset=ds, 
...                  variables=["roads_gross", "buildings_gross"], 
...                  new_variable_name="average_gross"
... )

def get_netcdf_info(netcdf_file, variable_name=None):

Extract information about variables and dimensions from a NetCDF dataset.

Parameters

netcdf_file : xarray.Dataset or str. xarray dataset or a path to a NetCDF file. If a file path is provided, it will be automatically loaded into an xarray.Dataset.
variable_name : str, optional. The prefix or complete name of the variable to filter. If not provided, all variables are included.

Returns

tuple, A tuple containing lists of dimensions, short names, long names, units, & time values (if 'time' exists).

Example

>>> get_netcdf_info(netcdf_file=netcdf_file_path, 
...                 variable_name="railway_length"
... )

def atlas(directory):

List all NetCDF files in a directory and count the number of variables in each.

Parameters

directory : str. Path to the directory containing NetCDF files.

Returns

pd.DataFrame. A DataFrame with file names and the number of variables in each file.

Example

>>> atlas(directory)

def info(data):

Extract metadata for each variable in a NetCDF dataset.

Parameters

data : str, os.PathLike, or xarray.Dataset. Path to a NetCDF file or an xarray.Dataset object.

Returns

pd.DataFrame. A DataFrame containing variable names, long names, units, sources, time range (start and end), time resolution (step), and depth values (if present as a variable).

Example

>>> info(netcdf_path)