Loading Data Using Dask#

This notebook demonstrates how to load unstructured grid datasets into UXarray using Dask. It covers:

  • Loading high-resolution grid files: Includes a discussion on lazily converting these files to UGRID conventions.

  • Parallelizing UGRID conversion using chunking: Explains how to use chunking to distribute the workload across Dask workers.

  • Loading large datasets paired with grid files: Explores strategies for efficiently handling large datasets, including scenarios involving many individual data files.

import warnings

import dask
import xarray as xr
from dask.distributed import Client, LocalCluster

import uxarray as ux

warnings.filterwarnings("ignore")

Dask Setup#

This notebook runs on a single node of NCAR Derecho’s Supercomputer. Below, we set up our local cluster and client with Dask.

For more information about running Dask on NCAR’s systems, please refer to NCAR Dask Tutorial.

cluster = LocalCluster()
client = Client(cluster)
client

Client

Client-7e25e4ed-f549-11ef-9f0e-0040a687f9c6

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/philipc/proxy/42425/status

Cluster Info

Data#

This notebook uses two datasets to demonstrate Dask functionality.

The first dataset is a 3.75 km MPAS Atmosphere Grid paired with a single diagnostic file.

The second dataset comes from the Department of Energy (DOE) Energy Exascale Earth System Model (E3SM). The case is configured as follows:

  • Atmosphere-only (AMIP)

  • Present-day control forcing (F2010)

  • 1-degree horizontal resolution (ne30pg2)

  • Default Sea surface temperatures and sea ice

Special thanks to Falko Judt (NSF NCAR MMM) and Rachel Tam (UIUC) for sharing the data with us! sharing the data with us!

mpas_grid_path = "/glade/campaign/cisl/vast/uxarray/data/dyamond/3.75km/grid.nc"
mpas_data_path = "/glade/campaign/mmm/wmr/fjudt/projects/dyamond_1/3.75km/diag.2016-08-01_00.00.00.nc"
e3sm_grid_path = (
    "/glade/campaign/cisl/vast/uxarray/data/e3sm_keeling/E3SM_grid/ne30pg2_grd.nc"
)
e3sm_data_pattern = "/glade/campaign/cisl/vast/uxarray/data/e3sm_keeling/ENSO_ctl_1std/unstructured/*.nc"

Loading Large Grid Files#

UXarray represents every grid format using the UGRID conventions, which often requires multiple pre-processing steps on the original grid data. These steps typically include:

  • Converting from 1-indexed to 0-indexed connectivity variables.

  • Replacing existing fill values with our standardized INT_FILL_VALUE.

  • Shifting longitude coordinates to the range [-180, 180].

Many of these operations are relatively simple and can be delayed until the variable is needed. By loading the data as Dask arrays rather than directly into memory, we can defer these computations while still creating a Grid instance. An added benefit is that only the required variables are computed when accessed, which is useful since grid files often contain additional variables that may not be immediately needed.

Most of UXarray’s supported grid formats allow for a lazy conversion to UGRID.

Supported (Fully supports lazy conversions with Dask):

  • UGRID

  • MPAS

  • ICON

  • ESMF

  • HEALPix

  • EXODUS

  • FESOM (netCDF)

Currently Unsupported:

  • SCRIP

  • GEOS

  • Structured

  • Points

  • FESOM (ascii)

Let’s examine an extreme case. Below, we have a complete 3.75 km MPAS atmosphere grid that contains a full set of grid variables, including multiple coordinates and connectivity variables.

First, let’s try to eagerly load the entire grid into memory without specifying any chunks. chunking.

%%time
uxgrid = ux.open_grid(mpas_grid_path)
uxgrid
CPU times: user 20.2 s, sys: 1min 37s, total: 1min 57s
Wall time: 1min 38s
<uxarray.Grid>
Original Grid Type: MPAS
Grid Dimensions:
  * n_node: 83886080
  * n_edge: 125829120
  * n_face: 41943042
  * n_max_face_nodes: 6
  * n_max_face_edges: 6
  * n_max_face_faces: 6
  * n_max_node_faces: 3
  * two: 2
Grid Coordinates (Spherical):
  * node_lon: (83886080,)
  * node_lat: (83886080,)
  * edge_lon: (125829120,)
  * edge_lat: (125829120,)
  * face_lon: (41943042,)
  * face_lat: (41943042,)
Grid Coordinates (Cartesian):
  * node_x: (83886080,)
  * node_y: (83886080,)
  * node_z: (83886080,)
  * edge_x: (125829120,)
  * edge_y: (125829120,)
  * edge_z: (125829120,)
  * face_x: (41943042,)
  * face_y: (41943042,)
  * face_z: (41943042,)
Grid Connectivity Variables:
  * edge_face_connectivity: (125829120, 2)
  * node_face_connectivity: (83886080, 3)
  * face_edge_connectivity: (41943042, 6)
  * edge_node_connectivity: (125829120, 2)
  * face_node_connectivity: (41943042, 6)
  * face_face_connectivity: (41943042, 6)
Grid Descriptor Variables:
  * face_areas: (41943042,)
  * edge_face_distances: (125829120,)
  * edge_node_distances: (125829120,)

This takes over a minute on a node of Derecho. We can observe that our Grid contains a large number of variables, many of which we may never end up using.

Compared to using Xarray directly, this represents a significant performance difference. Note that because UXarray requires all grids to be internally represented using UGRID conventions, loading a Grid will always be slower than a pure Xarray approach. .

%%time
xrds = xr.open_dataset(mpas_grid_path)
CPU times: user 16.3 ms, sys: 105 ms, total: 121 ms
Wall time: 107 ms

One workaround is to specify a chunks parameter, which uses Dask to load the grid variables. Because Dask allows computations to be delayed, we can defer these operations until they’re necessary, significantly reducing the time required to open a grid and explore its contents.

Below, we set chunks=-1, which loads all of our data as Dask arrays, using a single chunk per variable. .

%%time
uxgrid = ux.open_grid(mpas_grid_path, chunks=-1)
uxgrid
CPU times: user 1.67 s, sys: 1.26 s, total: 2.93 s
Wall time: 14.2 s
<uxarray.Grid>
Original Grid Type: MPAS
Grid Dimensions:
  * n_node: 83886080
  * n_edge: 125829120
  * n_face: 41943042
  * n_max_face_nodes: 6
  * n_max_face_edges: 6
  * n_max_face_faces: 6
  * n_max_node_faces: 3
  * two: 2
Grid Coordinates (Spherical):
  * node_lon: (83886080,)
  * node_lat: (83886080,)
  * edge_lon: (125829120,)
  * edge_lat: (125829120,)
  * face_lon: (41943042,)
  * face_lat: (41943042,)
Grid Coordinates (Cartesian):
  * node_x: (83886080,)
  * node_y: (83886080,)
  * node_z: (83886080,)
  * edge_x: (125829120,)
  * edge_y: (125829120,)
  * edge_z: (125829120,)
  * face_x: (41943042,)
  * face_y: (41943042,)
  * face_z: (41943042,)
Grid Connectivity Variables:
  * edge_face_connectivity: (125829120, 2)
  * node_face_connectivity: (83886080, 3)
  * face_edge_connectivity: (41943042, 6)
  * edge_node_connectivity: (125829120, 2)
  * face_node_connectivity: (41943042, 6)
  * face_face_connectivity: (41943042, 6)
Grid Descriptor Variables:
  * face_areas: (41943042,)
  * edge_face_distances: (125829120,)
  * edge_node_distances: (125829120,)

We can see above that all our variables are loaded as dask.array objects. By inspecting the high-level Dask graph for face_node_connectivity, we can observe the complete set of computations and steps taken to parse and encode the data according to the UGRID conventions.

uxgrid.face_node_connectivity.data.dask

HighLevelGraph

HighLevelGraph with 17 layers and 17 keys from all layers.

Layer1: original

original-open_dataset-verticesOnCell-bf7a984439bb267dd3463be0321be53d

layer_type MaterializedLayer
is_materialized True
number of outputs 1

Layer2: open_dataset-verticesOnCell

open_dataset-verticesOnCell-bf7a984439bb267dd3463be0321be53d

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype int32
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on original-open_dataset-verticesOnCell-bf7a984439bb267dd3463be0321be53d
6 41943042

Layer3: astype

astype-c92bc80ebc54956fcc28078df192a41f

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype int64
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on open_dataset-verticesOnCell-bf7a984439bb267dd3463be0321be53d
6 41943042

Layer4: original

original-open_dataset-nEdgesOnCell-acc252cd4d77eb187160b7314ab2a543

layer_type MaterializedLayer
is_materialized True
number of outputs 1

Layer5: open_dataset-nEdgesOnCell

open_dataset-nEdgesOnCell-acc252cd4d77eb187160b7314ab2a543

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042,)
dtype int32
chunksize (41943042,)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on original-open_dataset-nEdgesOnCell-acc252cd4d77eb187160b7314ab2a543
41943042 1

Layer6: astype

astype-fbdcb7f49fdae873ba694bac7fc48701

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042,)
dtype int64
chunksize (41943042,)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on open_dataset-nEdgesOnCell-acc252cd4d77eb187160b7314ab2a543
41943042 1

Layer7: getitem

getitem-149ca190fe8db177a5211959d67d7ee1

layer_type MaterializedLayer
is_materialized True
number of outputs 1
shape (1, 41943042)
dtype int64
chunksize (1, 41943042)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on astype-fbdcb7f49fdae873ba694bac7fc48701
41943042 1

Layer8: array

array-2a2bc8dce6a9b78455d5fb439011f414

layer_type MaterializedLayer
is_materialized True
number of outputs 1
shape (6, 1)
dtype int64
chunksize (6, 1)
type dask.array.core.Array
chunk_type numpy.ndarray
1 6

Layer9: greater_equal

greater_equal-2d1a7a47fd227551905b779fce997c1f

layer_type Blockwise
is_materialized False
number of outputs 1
shape (6, 41943042)
dtype bool
chunksize (6, 41943042)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on getitem-149ca190fe8db177a5211959d67d7ee1
array-2a2bc8dce6a9b78455d5fb439011f414
41943042 6

Layer10: invert

invert-9f8a3423432fa8a6c044a6077f651ab6

layer_type Blockwise
is_materialized False
number of outputs 1
shape (6, 41943042)
dtype bool
chunksize (6, 41943042)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on greater_equal-2d1a7a47fd227551905b779fce997c1f
41943042 6

Layer11: transpose

transpose-27a470fc1f21f1ecf2b93e74b2929b02

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype bool
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on invert-9f8a3423432fa8a6c044a6077f651ab6
6 41943042

Layer12: where

where-8d58b872c1ad4b269110802d019c6040

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype int64
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on astype-c92bc80ebc54956fcc28078df192a41f
transpose-27a470fc1f21f1ecf2b93e74b2929b02
6 41943042

Layer13: ne

ne-5373f6ebc9c1a1bad77b4422984d3ec4

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype bool
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on where-8d58b872c1ad4b269110802d019c6040
6 41943042

Layer14: where

where-674faff89656f343b9d92ed9352dd276

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype int64
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on ne-5373f6ebc9c1a1bad77b4422984d3ec4
where-8d58b872c1ad4b269110802d019c6040
6 41943042

Layer15: sub

sub-2e8e6ffdaeb8aca6261d37bc892ec282

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype int64
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on where-674faff89656f343b9d92ed9352dd276
6 41943042

Layer16: ne

ne-0f992b881c4b11dd6dd37e20b682661f

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype bool
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on where-674faff89656f343b9d92ed9352dd276
6 41943042

Layer17: where

where-220ec0d25750ec28efbf84f9a4cf39bb

layer_type Blockwise
is_materialized False
number of outputs 1
shape (41943042, 6)
dtype int64
chunksize (41943042, 6)
type dask.array.core.Array
chunk_type numpy.ndarray
depends on sub-2e8e6ffdaeb8aca6261d37bc892ec282
ne-0f992b881c4b11dd6dd37e20b682661f
where-674faff89656f343b9d92ed9352dd276
6 41943042

If we want to load this variable into memory, we can use either the .load() or .compute() methods:

  • .load() performs an in-place loading.

  • .compute() returns a new variable with the data loaded into memory.

For example, to load the face_node_connectivity variable into memory, you would do the following:

# load the variable in place
uxgrid.face_node_connectivity.load()

# create a new variable and assign it to the original using compute
uxgrid.face_node_connectivity_loaded = uxgrid.face_node_connectivity.compute()

Inspecting our Grid once again, we see that after these computations, only the face_node_connectivity variable is loaded into memory, while the remaining variables remain as Dask arrays.

uxgrid
<uxarray.Grid>
Original Grid Type: MPAS
Grid Dimensions:
  * n_node: 83886080
  * n_edge: 125829120
  * n_face: 41943042
  * n_max_face_nodes: 6
  * n_max_face_edges: 6
  * n_max_face_faces: 6
  * n_max_node_faces: 3
  * two: 2
Grid Coordinates (Spherical):
  * node_lon: (83886080,)
  * node_lat: (83886080,)
  * edge_lon: (125829120,)
  * edge_lat: (125829120,)
  * face_lon: (41943042,)
  * face_lat: (41943042,)
Grid Coordinates (Cartesian):
  * node_x: (83886080,)
  * node_y: (83886080,)
  * node_z: (83886080,)
  * edge_x: (125829120,)
  * edge_y: (125829120,)
  * edge_z: (125829120,)
  * face_x: (41943042,)
  * face_y: (41943042,)
  * face_z: (41943042,)
Grid Connectivity Variables:
  * edge_face_connectivity: (125829120, 2)
  * node_face_connectivity: (83886080, 3)
  * face_edge_connectivity: (41943042, 6)
  * edge_node_connectivity: (125829120, 2)
  * face_node_connectivity: (41943042, 6)
  * face_face_connectivity: (41943042, 6)
Grid Descriptor Variables:
  * face_areas: (41943042,)
  * edge_face_distances: (125829120,)
  * edge_node_distances: (125829120,)

Chunking Grid Dimensions#

Our grid consists of 41,943,042 faces, 83,886,080 nodes, and 125,829,120 edges. Instead of having a single chunk for each variable, we can consider chunking each individual variable across the grid dimensions.

By chunking the variables when loading them, we can distribute the work evenly across our Dask workers. The operations applied when encoding the grid format into UGRID conventions are embarrassingly parallelizable.

Recall that on a single node of Derecho, we have 256 available threads. Let’s evenly divide our data across all of our threads.

face_chunk = round(41_943_042 // 256)
node_chunk = round(83_886_080 // 256)
edge_chunk = round(125_829_120 // 256)

We can now specify our chunk parameter by passing a dictionary where each dimension is mapped to its corresponding chunk size.

%%time
uxgrid = ux.open_grid(
    mpas_grid_path,
    chunks={"n_face": face_chunk, "n_node": node_chunk, "n_edge": edge_chunk},
)
uxgrid
CPU times: user 1.44 s, sys: 608 ms, total: 2.04 s
Wall time: 4.59 s
<uxarray.Grid>
Original Grid Type: MPAS
Grid Dimensions:
  * n_node: 83886080
  * n_edge: 125829120
  * n_face: 41943042
  * n_max_face_nodes: 6
  * n_max_face_edges: 6
  * n_max_face_faces: 6
  * n_max_node_faces: 3
  * two: 2
Grid Coordinates (Spherical):
  * node_lon: (83886080,)
  * node_lat: (83886080,)
  * edge_lon: (125829120,)
  * edge_lat: (125829120,)
  * face_lon: (41943042,)
  * face_lat: (41943042,)
Grid Coordinates (Cartesian):
  * node_x: (83886080,)
  * node_y: (83886080,)
  * node_z: (83886080,)
  * edge_x: (125829120,)
  * edge_y: (125829120,)
  * edge_z: (125829120,)
  * face_x: (41943042,)
  * face_y: (41943042,)
  * face_z: (41943042,)
Grid Connectivity Variables:
  * edge_face_connectivity: (125829120, 2)
  * node_face_connectivity: (83886080, 3)
  * face_edge_connectivity: (41943042, 6)
  * edge_node_connectivity: (125829120, 2)
  * face_node_connectivity: (41943042, 6)
  * face_face_connectivity: (41943042, 6)
Grid Descriptor Variables:
  * face_areas: (41943042,)
  * edge_face_distances: (125829120,)
  * edge_node_distances: (125829120,)

Now let’s load in the minimal amount of variables we need. For many applications in UXarray, such as visualization, only the node_lon, node_lat, and face_node_connectivity variables are required.

By calling .load() on each of these variables, we trigger the computation of their conversion to the UGRID conventions and load them into memory.

%%time
uxgrid.face_node_connectivity.load()
uxgrid.node_lon.load()
uxgrid.node_lat.load()
uxgrid
CPU times: user 2.54 s, sys: 2.48 s, total: 5.02 s
Wall time: 8 s
<uxarray.Grid>
Original Grid Type: MPAS
Grid Dimensions:
  * n_node: 83886080
  * n_edge: 125829120
  * n_face: 41943042
  * n_max_face_nodes: 6
  * n_max_face_edges: 6
  * n_max_face_faces: 6
  * n_max_node_faces: 3
  * two: 2
Grid Coordinates (Spherical):
  * node_lon: (83886080,)
  * node_lat: (83886080,)
  * edge_lon: (125829120,)
  * edge_lat: (125829120,)
  * face_lon: (41943042,)
  * face_lat: (41943042,)
Grid Coordinates (Cartesian):
  * node_x: (83886080,)
  * node_y: (83886080,)
  * node_z: (83886080,)
  * edge_x: (125829120,)
  * edge_y: (125829120,)
  * edge_z: (125829120,)
  * face_x: (41943042,)
  * face_y: (41943042,)
  * face_z: (41943042,)
Grid Connectivity Variables:
  * edge_face_connectivity: (125829120, 2)
  * node_face_connectivity: (83886080, 3)
  * face_edge_connectivity: (41943042, 6)
  * edge_node_connectivity: (125829120, 2)
  * face_node_connectivity: (41943042, 6)
  * face_face_connectivity: (41943042, 6)
Grid Descriptor Variables:
  * face_areas: (41943042,)
  * edge_face_distances: (125829120,)
  * edge_node_distances: (125829120,)

Loading Large Datasets#

The previous example focused solely on working with the unstructured grid definition. In most cases, however, you’ll have an unstructured grid paired with data. This may involve loading a large series of data variables from a climate model that include many spatial and temporal dimensions. For these applications, using Dask is highly encouraged, as most machines cannot load all of this data into memory.

.

Opening a Single Data File#

Using the same grid as above, we can pair it with a data file to create a ux.UxDataset. In this example, we have a high-resolution grid paired with a single diagnostic file from MPAS. In this case, we can set chunks=-1 if we simply want our data to be loaded as Dask arrays. .

%%time
uxds = ux.open_dataset(mpas_grid_path, mpas_data_path, chunks=-1)
uxds
CPU times: user 1.62 s, sys: 999 ms, total: 2.62 s
Wall time: 12.8 s
<xarray.UxDataset> Size: 17GB
Dimensions:             (time: 1, StrLen: 64, n_face: 41943042, n_node: 83886080)
Coordinates:
  * time                (time) datetime64[ns] 8B 2016-08-01
Dimensions without coordinates: StrLen, n_face, n_node
Data variables: (12/99)
    xtime_old           (time, StrLen) |S1 64B dask.array<chunksize=(1, 64), meta=np.ndarray>
    taux                (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    tauy                (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    olrtoa              (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    cldcvr              (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    vert_int_qv         (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    ...                  ...
    umeridional_300hPa  (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    umeridional_400hPa  (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    uzonal_300hPa       (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    uzonal_400hPa       (time, n_face) float32 168MB dask.array<chunksize=(1, 41943042), meta=np.ndarray>
    xtime               (time, StrLen) |S1 64B dask.array<chunksize=(1, 64), meta=np.ndarray>
    zgrid               (n_face) float32 168MB dask.array<chunksize=(41943042,), meta=np.ndarray>

Let’s access our "relhum_200hPa" data variable and compute the global mean. Since our data is loaded using Dask, we need to trigger the computation using .compute() or .load(). For example:

%%time
uxds["relhum_200hPa"].mean().compute()
CPU times: user 94.5 ms, sys: 72.9 ms, total: 167 ms
Wall time: 727 ms
<xarray.UxDataArray 'relhum_200hPa' ()> Size: 4B
array(25.246592, dtype=float32)

Opening Multiple Data Files#

There may be times when the grid you are working with is small enough to load directly into memory, while other temporal or spatial dimensions in the dataset can benefit from chunking. In these cases, you can specify the chunk_grid=False parameter to apply chunking only to the additional dimensions.

%%time
uxds = ux.open_mfdataset(
    e3sm_grid_path,
    e3sm_data_pattern,
    # concatenate along this dimension
    concat_dim="time",
    # concatenate files in the order provided
    combine="nested",
    chunks={
        "lev": 4,
    },
    parallel=True,
    # eagerly load grid into memory
    chunk_grid=False,
)
uxds
CPU times: user 19.6 s, sys: 437 ms, total: 20.1 s
Wall time: 17.1 s
<xarray.UxDataset> Size: 37GB
Dimensions:              (time: 72, n_face: 21600, lev: 72, ilev: 73,
                          cosp_prs: 7, nbnd: 2, cosp_tau: 7, cosp_ht: 40,
                          cosp_sr: 15, cosp_htmisr: 16, cosp_tau_modis: 7,
                          cosp_reffice: 6, cosp_reffliq: 6, cosp_sza: 5,
                          cosp_scol: 10)
Coordinates: (12/13)
  * lev                  (lev) float64 576B 0.1238 0.1828 0.2699 ... 993.8 998.5
  * ilev                 (ilev) float64 584B 0.1 0.1477 0.218 ... 997.0 1e+03
  * cosp_prs             (cosp_prs) float64 56B 9e+04 7.4e+04 ... 2.45e+04 9e+03
  * cosp_tau             (cosp_tau) float64 56B 0.15 0.8 2.45 ... 41.5 100.0
  * cosp_scol            (cosp_scol) int32 40B 1 2 3 4 5 6 7 8 9 10
  * cosp_ht              (cosp_ht) float64 320B 1.896e+04 1.848e+04 ... 240.0
    ...                   ...
  * cosp_sza             (cosp_sza) float64 40B 0.0 20.0 40.0 60.0 80.0
  * cosp_htmisr          (cosp_htmisr) float64 128B 0.0 250.0 ... 1.8e+04
  * cosp_tau_modis       (cosp_tau_modis) float64 56B 0.15 0.8 ... 41.5 100.0
  * cosp_reffice         (cosp_reffice) float64 48B 5e-06 1.5e-05 ... 7.5e-05
  * cosp_reffliq         (cosp_reffliq) float64 48B 4e-06 9e-06 ... 2.5e-05
  * time                 (time) object 576B 0001-02-01 00:00:00 ... 0007-01-0...
Dimensions without coordinates: n_face, nbnd
Data variables: (12/471)
    lat                  (time, n_face) float64 12MB dask.array<chunksize=(1, 21600), meta=np.ndarray>
    lon                  (time, n_face) float64 12MB dask.array<chunksize=(1, 21600), meta=np.ndarray>
    area                 (time, n_face) float64 12MB dask.array<chunksize=(1, 21600), meta=np.ndarray>
    hyam                 (time, lev) float64 41kB dask.array<chunksize=(1, 4), meta=np.ndarray>
    hybm                 (time, lev) float64 41kB dask.array<chunksize=(1, 4), meta=np.ndarray>
    P0                   (time) float64 576B 1e+05 1e+05 1e+05 ... 1e+05 1e+05
    ...                   ...
    soa_c1DDF            (time, n_face) float32 6MB dask.array<chunksize=(1, 21600), meta=np.ndarray>
    soa_c1SFWET          (time, n_face) float32 6MB dask.array<chunksize=(1, 21600), meta=np.ndarray>
    soa_c2DDF            (time, n_face) float32 6MB dask.array<chunksize=(1, 21600), meta=np.ndarray>
    soa_c2SFWET          (time, n_face) float32 6MB dask.array<chunksize=(1, 21600), meta=np.ndarray>
    soa_c3DDF            (time, n_face) float32 6MB dask.array<chunksize=(1, 21600), meta=np.ndarray>
    soa_c3SFWET          (time, n_face) float32 6MB dask.array<chunksize=(1, 21600), meta=np.ndarray>

Cleanup#

Always remember to shut down the Dask cluster when you’re done!

client.shutdown()