Working with Unstructured Grid Data#

Authors: Philip Chmielowiec, Orhan Eroglu

UXarray offers support for loading and representing unstructured grids by providing Xarray-like functionality paired with new routines that are specifically written for operating on unstructured grids.

Grid Definition and Data Variables#

When working with Unstructured Grids, the grid definition and data variables are often stored as separate files. This means that there are multiple separate files that need to be read and linked together to represent the entire dataset.

For example, the following sample dataset is taken from the NOAA Geoflow project, which is made up of 4 files: 1 grid definition and 3 data files. (Special thanks to John Clyne, Shilpi Gupta, and the VAPOR team for providing this data!)

geoflow-small
│   grid.nc
│   v1.nc
│   v2.nc
│   v3.nc

Grid Conventions#

Given the complexity of Unstructured Grids, there are many different ways of representing their underlying topology and structure. These representations are referred to as conventions, and they outline the required connectivity variables, naming conventions, data types, and many other specifications. UXarray uses the UGRID conventions as a foundation for internally representing Unstructured Grids, converting any supported input grid format into the UGRID convention at the data loading step. Below is a list of supported formats and conventions that can be read in with UXarray:

  • UGRID

  • Model for Prediction Across Scales (MPAS)

  • Exodus

In addition to loading datasets, we also provide support for constructing a grid from user-defined primitives such as vertices, which is showcased in our other notebooks.

Reading Grid and Data Files#

UXarray provides the UxDataset data structure, which is an unstructure grid-informed implementation of Xarray’s Dataset class. The main addition is the introduction of the uxgrid property, which stores our grid topology dimensions, coordinates, variables and provides grid-specific functions.

Constructing a UxDataset can be done using our custom open_dataset and open_mfdataset methods, depending on whether one or multiple data files or objects are meant to be linked to a single grid.

import uxarray as ux
import numpy as np
# Base data path
base_path = "../../test/meshfiles/ugrid/geoflow-small/"

# Path to Grid file
grid_path = base_path + "grid.nc"

# Paths to Data Variable files
var_names = ['v1.nc', 'v2.nc', 'v3.nc']

data_paths = [base_path + name for name in var_names]

Loading a single data file with a grid is done using the open_dataset method. The resulting UxDataset only contains the data variables stored in v1.nc.

uxds_single = ux.open_dataset(grid_path, data_paths[0])
uxds_single
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/core/utils.py:22: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  if ds.dims[dim] == grid.n_face:
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:435: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_face"]
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/core/utils.py:24: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  elif ds.dims[dim] == grid.n_node:
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:429: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_node"]
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/core/utils.py:26: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  elif ds.dims[dim] == grid.n_edge:
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:444: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_edge"]
<xarray.UxDataset> Size: 960kB
Dimensions:  (time: 1, meshLayers: 20, n_node: 6000)
Coordinates:
  * time     (time) float64 8B 13.0
Dimensions without coordinates: meshLayers, n_node
Data variables:
    v1       (time, meshLayers, n_node) float64 960kB -0.009766 ... 0.03283

Similarly, if you wish to open multiple data files with a grid, you would use the open_mfdataset method. The resulting UxDataset contains all the data variables stored in v1.nc, v2.nc, and v3.nc

uxds_multiple = ux.open_mfdataset(grid_path, data_paths)
uxds_multiple
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/core/utils.py:22: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  if ds.dims[dim] == grid.n_face:
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:435: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_face"]
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/core/utils.py:24: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  elif ds.dims[dim] == grid.n_node:
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:429: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_node"]
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/core/utils.py:26: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  elif ds.dims[dim] == grid.n_edge:
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:444: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_edge"]
<xarray.UxDataset> Size: 3MB
Dimensions:  (time: 1, meshLayers: 20, n_node: 6000)
Coordinates:
  * time     (time) float64 8B 13.0
Dimensions without coordinates: meshLayers, n_node
Data variables:
    v1       (time, meshLayers, n_node) float64 960kB dask.array<chunksize=(1, 20, 6000), meta=np.ndarray>
    v2       (time, meshLayers, n_node) float64 960kB dask.array<chunksize=(1, 20, 6000), meta=np.ndarray>
    v3       (time, meshLayers, n_node) float64 960kB dask.array<chunksize=(1, 20, 6000), meta=np.ndarray>

Grid Topology#

Each dataset contains the aforementioned uxgrid property, which is a Grid object and represents the grid topology that the data variables lie on. The uxgrid property can be used to execute grid specific functions and access grid topology dimensions, coordinates, and variables. A detailed overview of functionalities can be found in subsequent notebooks.

For both instances of UxDataset that contain single and multiple data sets (i.e. uxds_single and uxds_multiple), the uxgrid property contains the same grid information, however they are each instantiated separately.

# check if the grids contain the same variables & information
print(uxds_single.uxgrid == uxds_multiple.uxgrid)

# check if the grids point to the same object in memory
print(uxds_single.uxgrid is uxds_multiple.uxgrid)
True
False

Printing out the uxgrid property provides an overview of the original grid format, dimensions, coordinates, and connectivity variables.

uxds_multiple.uxgrid
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:275: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  for key, value in zip(self._ds.dims.keys(), self._ds.dims.values()):
<frozen _collections_abc>:880: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:435: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_face"]
<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_face: 3840
  * n_max_face_nodes: 4
  * n_node: 6000
  * meshLayers: 20
  * n_edge: 9600
  * Two: 2
  * n_nodes_per_face: (3840,)
Grid Coordinates (Spherical):
  * node_lon: (6000,)
  * node_lat: (6000,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (3840, 4)
  * edge_node_connectivity: (9600, 2)

These dimensions, coordinates, and connectivity variables can be accessed with attributes using the same names as shown in the print-out. Below are a few examples.

uxds_multiple.uxgrid.n_node
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:429: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_node"]
6000
uxds_multiple.uxgrid.node_lon
<xarray.DataArray 'node_lon' (n_node: 6000)> Size: 48kB
array([ 0.      ,  5.213776, 16.497974, ..., 62.701014, 68.804082, 72.      ])
Coordinates:
    node_lon  (n_node) float64 48kB 0.0 5.214 16.5 29.14 ... 62.7 68.8 72.0
    node_lat  (n_node) float64 48kB 58.28 59.8 62.06 ... -31.38 -31.68 -31.72
Dimensions without coordinates: n_node
Attributes:
    standard_name:  longitude
    long_name:      Longitude of 2D mesh nodes.
    units:          degrees_east
uxds_multiple.uxgrid.face_node_connectivity
<xarray.DataArray 'face_node_connectivity' (n_face: 3840, n_max_face_nodes: 4)> Size: 123kB
array([[   0,    1,    6,    5],
       [   1,    2,    7,    6],
       [   2,    3,    8,    7],
       ...,
       [5991, 5992, 5997, 5996],
       [5992, 5993, 5998, 5997],
       [5993, 5994, 5999, 5998]])
Dimensions without coordinates: n_face, n_max_face_nodes
Attributes:
    cf_role:      face_node_connectivity
    long_name:    Maps every face to its corner nodes.
    start_index:  0
    _FillValue:   -9223372036854775808

Data Variables#

While grid-specific variables and functions are stored under the uxgrid property, data variables that lie on the grid are stored directly in the UxDataset or UxDataArray. Most Xarray functions and operators can be executed on these data structures.

uxds_single.values
<bound method Mapping.values of <xarray.UxDataset> Size: 960kB
Dimensions:  (time: 1, meshLayers: 20, n_node: 6000)
Coordinates:
  * time     (time) float64 8B 13.0
Dimensions without coordinates: meshLayers, n_node
Data variables:
    v1       (time, meshLayers, n_node) float64 960kB -0.009766 ... 0.03283>
uxds_single.dims
FrozenMappingWarningOnValuesAccess({'time': 1, 'meshLayers': 20, 'n_node': 6000})
uxds_single.coords
Coordinates:
  * time     (time) float64 8B 13.0
uxds_single.attrs
{}
uxds_single.min()
<xarray.UxDataset> Size: 8B
Dimensions:  ()
Data variables:
    v1       float64 8B -18.53
uxds_single > 0
<xarray.UxDataset> Size: 120kB
Dimensions:  (time: 1, meshLayers: 20, n_node: 6000)
Coordinates:
  * time     (time) float64 8B 13.0
Dimensions without coordinates: meshLayers, n_node
Data variables:
    v1       (time, meshLayers, n_node) bool 120kB False False ... True True
grid = uxds_single.uxgrid
foo = ux.UxDataArray(
    data = np.random.random(grid.n_face),
    dims = ["n_face"],
    uxgrid = grid
)
foo
/home/docs/checkouts/readthedocs.org/user_builds/uxarray/conda/latest/lib/python3.11/site-packages/uxarray/grid/grid.py:435: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  return self._ds.dims["n_face"]
<xarray.UxDataArray (n_face: 3840)> Size: 31kB
array([0.25963025, 0.65873738, 0.26562984, ..., 0.04309797, 0.52424195,
       0.30151363])
Dimensions without coordinates: n_face
uxds_new_var = uxds_single.assign({"foo" : foo})
uxds_new_var
<xarray.UxDataset> Size: 991kB
Dimensions:  (time: 1, meshLayers: 20, n_node: 6000, n_face: 3840)
Coordinates:
  * time     (time) float64 8B 13.0
Dimensions without coordinates: meshLayers, n_node, n_face
Data variables:
    v1       (time, meshLayers, n_node) float64 960kB -0.009766 ... 0.03283
    foo      (n_face) float64 31kB 0.2596 0.6587 0.2656 ... 0.0431 0.5242 0.3015