Data Structures#

import uxarray as ux
import xarray as xr

The core functionality of UXarray revolves around three data structures, which are used for interacting with unstructured grids and the data variables that reside on them.

  1. uxarray.Grid: Stores the grid representation (i.e. coordinates, connectivity information, etc.)

  2. uxarray.UxDataset: One or more data variable that resided on a grid.

  3. uxarray.UxDataArray: A single data variable that resides on a grid

Grid and Data Files#

When working with unstructured grid datasets, the grid definition is typically stored separately from any data variables.

For example, the dataset used in this example is made up of two files: a single grid definition and a single data file.

quad-hexagon
│   grid.nc
│   data.nc
grid_path = "../../test/meshfiles/ugrid/quad-hexagon/grid.nc"
data_path = "../../test/meshfiles/ugrid/quad-hexagon/data.nc"

Additionally, there may be multiple data files that are mapped to the same unstructured grid (such as the case with climate model output). Using our sample dataset, this may look something like this:

quad-hexagon
│   grid.nc
│   data1.nc
|   data2.nc
|   data3.nc

We can store these paths as a list (in this case we simply repeat the original data file to imitate having 4 separate data files)

multiple_data_paths = [data_path for i in range(3)]

Grid#

The Grid class is used for storing variables associated with an unstructured grid’s topology. This includes dimensions, coordinates, and connectivity variables.

Creating a Grid#

The recommended way to construct a Grid is by using the ux.open_grid() method, which takes in a grid file path, detects the input grid format, and parses and encodes the provided coordinates and connectivity into the UGRID conventions. Details on supported grid formats and what variables are parsed can be found in other parts of this user guide.

uxgrid = ux.open_grid(grid_path)
uxgrid
<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_node: 16
  * n_edge: 19
  * n_face: 4
  * n_max_face_nodes: 6
  * two: 2
  * n_nodes_per_face: (4,)
Grid Coordinates (Spherical):
  * node_lon: (16,)
  * node_lat: (16,)
  * edge_lon: (19,)
  * edge_lat: (19,)
  * face_lon: (4,)
  * face_lat: (4,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (4, 6)
  * edge_node_connectivity: (19, 2)
Grid Descriptor Variables:

Accessing Variables#

As we saw above when printing out Grid instance, there are many variables that are associated with a single grid. In addition to the general repr, we can obtain the stored dimensions, coordinates, and connectivity variables through the following attributes.

uxgrid.dims
{'n_edge', 'n_face', 'n_max_face_nodes', 'n_node', 'two'}
uxgrid.sizes
{'n_face': 4, 'two': 2, 'n_edge': 19, 'n_node': 16, 'n_max_face_nodes': 6}
uxgrid.coordinates
{'edge_lat', 'edge_lon', 'face_lat', 'face_lon', 'node_lat', 'node_lon'}
uxgrid.connectivity
{'edge_node_connectivity', 'face_node_connectivity'}

We can access any desired quantity by either calling an attribute by the same name or by indexing a Grid like a dictionary.

uxgrid.node_lon
<xarray.DataArray 'node_lon' (n_node: 16)> Size: 64B
array([-0.038411, -0.186448, -0.181994, -0.047552,  0.100399,  0.096181,
        0.105031,  0.239774, -0.177116, -0.325237,  0.378854,  0.382835,
       -0.042753, -0.320659,  0.235225,  0.243879], dtype=float32)
Dimensions without coordinates: n_node
Attributes:
    standard_name:  longitude
    long_name:      longitude of mesh nodes
    units:          degrees_east
uxgrid["node_lon"]
<xarray.DataArray 'node_lon' (n_node: 16)> Size: 64B
array([-0.038411, -0.186448, -0.181994, -0.047552,  0.100399,  0.096181,
        0.105031,  0.239774, -0.177116, -0.325237,  0.378854,  0.382835,
       -0.042753, -0.320659,  0.235225,  0.243879], dtype=float32)
Dimensions without coordinates: n_node
Attributes:
    standard_name:  longitude
    long_name:      longitude of mesh nodes
    units:          degrees_east

Constructing Additional Variables#

Looking at Grid.connectivity one more time, we can see that there are only two available variables.

uxgrid.connectivity
{'edge_node_connectivity', 'face_node_connectivity'}

These variables are the ones that were able to be parsed and encoded in the UGRID conventions from the inputted grid file.

In addition to parsing variables, we can construct additional variables by calling the attribute or indexing the Grid with the desired name. For example, if we wanted to construct the face_edge_connectivity, we would do the following:

uxgrid.face_edge_connectivity
<xarray.DataArray 'face_edge_connectivity' (n_face: 4, n_max_face_edges: 6)> Size: 192B
array([[ 0,  3,  5,  6,  7,  1],
       [12, 10, 11,  2,  1,  9],
       [ 2, 14, 13, 15,  4,  0],
       [ 8, 18, 16, 17,  9,  7]])
Dimensions without coordinates: n_face, n_max_face_edges
Attributes:
    cf_role:      face_edge_connectivity
    long name:    Maps every face to its edges.
    start_index:  0
    _FillValue:   -9223372036854775808
    dtype:        <class 'numpy.int64'>

Now if we look at our Grid.connectivity, we can see that it now contains our new connectivity variable.

uxgrid.connectivity
{'edge_node_connectivity', 'face_edge_connectivity', 'face_node_connectivity'}

All grid variables can be accessed using an attribute. At the time the user calls the attribute (in the above example uxgrid.face_edge_connectivity), there is code in place to check whether the variable is present within the Grid. If it’s available, it is directly returned to the user, otherwise it is constructed. Below shows off how this works internally.

@property
def face_edge_connectivity(self) -> xr.DataArray:
    """Indices of the edges that surround each face.

    Dimensions: ``(n_face, n_max_face_edges)``
    """
    if "face_edge_connectivity" not in self._ds:
        _populate_face_edge_connectivity(self)

    return self._ds["face_edge_connectivity"]

UxDataset#

Up to this point, we’ve exclusively looked at the unstructured grid without any data variables mapped to it. Working with a standalone Grid has its applications, such as grid debugging and analysis, however more commonly an unstructured grid is paired with data variables that are mapped to it.

The UxDataset class is used for pairing one or more data variables with an unstructured grid. It operates similarly to a xarrary.Dataset, with the addition of unstructured-grid specific functionality and is linked to an instance of a Grid.

Opening a Single Data File#

We can load a pair of grid and data files using the ux.open_dataset() method.

uxds = ux.open_dataset(grid_path, data_path)
uxds
<xarray.UxDataset> Size: 16B
Dimensions:  (n_face: 4)
Dimensions without coordinates: n_face
Data variables:
    t2m      (n_face) float32 16B ...

Opening Multiple Data Files#

When working with multiple data paths, we can open them using the ux.open_mfdataset() method.

uxds_multi = ux.open_mfdataset(
    grid_path, multiple_data_paths, combine="nested", concat_dim="time"
)
uxds_multi
<xarray.UxDataset> Size: 48B
Dimensions:  (time: 3, n_face: 4)
Dimensions without coordinates: time, n_face
Data variables:
    t2m      (time, n_face) float32 48B dask.array<chunksize=(1, 4), meta=np.ndarray>

Grid Accessor#

Each UxDataset (and in the next section UxDataArray) is linked to a Grid instance, which contain the unstructured grid information.

uxds.uxgrid
<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_node: 16
  * n_edge: 19
  * n_face: 4
  * n_max_face_nodes: 6
  * two: 2
  * n_nodes_per_face: (4,)
Grid Coordinates (Spherical):
  * node_lon: (16,)
  * node_lat: (16,)
  * edge_lon: (19,)
  * edge_lat: (19,)
  * face_lon: (4,)
  * face_lat: (4,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (4, 6)
  * edge_node_connectivity: (19, 2)
Grid Descriptor Variables:

All the same functionality can be performed using the uxgrid attribute as was discussed in the Grid sections above.

uxds.uxgrid.dims
{'n_edge', 'n_face', 'n_max_face_nodes', 'n_node', 'two'}

UxDataArray#

While a UxDataset represents one or more data variables linked to some unstructured grid, a UxDataArray represent a single data variable. Alternatively, one can think of a UxDataset as a collection of one or more UxDataArray instances.

In our sample dataset, we have a variable called t2m, which can be used to index our UxDataset

uxds["t2m"]
<xarray.UxDataArray 't2m' (n_face: 4)> Size: 16B
[4 values with dtype=float32]
Dimensions without coordinates: n_face
Attributes:
    units:      K
    long_name:  2-meter temperature

We can see the relationship between a UxDataset and UxDataArray by checking the type.

type(uxds), type(uxds["t2m"])
(uxarray.core.dataset.UxDataset, uxarray.core.dataarray.UxDataArray)

As mentioned before, each UxDataArray is linked to a Grid instance.

uxds["t2m"].uxgrid
<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_node: 16
  * n_edge: 19
  * n_face: 4
  * n_max_face_nodes: 6
  * two: 2
  * n_nodes_per_face: (4,)
Grid Coordinates (Spherical):
  * node_lon: (16,)
  * node_lat: (16,)
  * edge_lon: (19,)
  * edge_lat: (19,)
  * face_lon: (4,)
  * face_lat: (4,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (4, 6)
  * edge_node_connectivity: (19, 2)
Grid Descriptor Variables:

This Grid is identical to the one linked to the UxDataset. Regardless of the number of data variables present in the UxDataset, they all share a single Grid instance.

uxds["t2m"].uxgrid == uxds.uxgrid
True

Functionality#

Just like with Xarray, we can perform various operations on our data variable.

uxds["t2m"].min()
<xarray.UxDataArray 't2m' ()> Size: 4B
array(297.25037, dtype=float32)
uxds["t2m"].mean()
<xarray.UxDataArray 't2m' ()> Size: 4B
array(297.54895, dtype=float32)

UXarray also provides custom data analysis operators which are explored in further sections of this user guide.

uxds["t2m"].gradient()
<xarray.UxDataArray 't2m_grad' (n_edge: 19)> Size: 152B
array([28.00277097, 20.66935101, 29.23128307,  0.        ,  0.        ,
        0.        ,  0.        , 60.59822029,  0.        , 86.32623052,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ])
Dimensions without coordinates: n_edge

Inheritance from Xarray#

For those that are familiar with Xarray, the naming of the methods and data structures looks familiar. UXarray aims to provide a familiar experience to Xarray by inheriting the xr.Dataset and xr.DataArray objects and linking them to an instance of a Grid class to provide grid-aware implementations.

We can observe this inheritance by checking for subclassing.

issubclass(ux.UxDataset, xr.Dataset)
True
issubclass(ux.UxDataArray, xr.DataArray)
True

Overloaded Methods#

With subclassing, all methods are directly inherited from the parent class (xr.Dataset). Most Xarray functionality works directly on UXarray’s data structures, however certain methods have been overloaded to make them unstructured-grid aware.

One example of this is the plotting functionality of a ux.UxDataArray, which was re-implemented to support visualuzations of unstructured grids. A detailed overview of plotting functionality can be found in the next sections.

uxds["t2m"].plot(fig_size=150, colorbar=False, cmap="inferno", backend="matplotlib")