EDK v0.1.2 Alpha Release

Release Date: 2025-05-04

Version: Alpha (v0.1.2)

Author: Earth Data Kit Team


🚀 What’s New

We’re excited to announce the first Alpha release of Earth Data Kit (EDK) — a unified toolkit making working with cloud-based GIS workflows easy.

The goal of this alpha release is to get a sense of whether something like EDK will be useful for GIS analysis, while working toward creating a seamless experience for geospatial data processing in the cloud. This initial release delivers foundational features focused on working with 4D raster datasets (time, band, x, y), built for ease of use and compatibility with the Python geospatial ecosystem.

Key Highlights

  • Cloud-Native Mosaic Creation Stitch rasters from cloud storage like AWS S3 or Google Earth Engine into time-based mosaics.

  • Conversion to xr.DataArray Native support for xarray and its ecosystem, including easy manipulation and analysis.

  • Built-In Export and Visualization Export to Cloud-Optimized GeoTIFFs and visualize with Folium maps directly from your DataArray.


🔧 Core Features

1. Cloud-Based Stitching API

Create timestamped mosaics from raster tiles stored in supported cloud storage systems.

  • discover() Scans and filters datasets based on spatial and temporal bounds. Returns a catalog of relevant tiles.

  • mosaic() Generates Virtual Raster Tiles (VRTs) for each timestamp using the discovered tiles.

  • save() Saves discovery metadata locally for reuse without re-running the scanning process.

Note

✅ Uses GDAL internally so most raster formats should be supported. Testing other formats is on the roadmap.
✅ Tested with GeoTIFFs on S3 and Earth Engine datasets

2. Integration with xr.DataArray

EDK transforms stitched mosaics into xarray.DataArray objects, unlocking advanced processing and visualization with Python libraries.

export()

  • Export 2D/3D/4D arrays to Cloud-Optimized GeoTIFFs (COGs)

  • Export destinations

    • Local disk

    • S3 buckets via s5cmd (local copy + upload)

  • Behavior:

    • 4D: one COG per time index (requires output folder)

    • 3D & 2D: requires explicit output file path

plot()

  • Renders a DataArray as an interactive map using Folium’s ImageOverlay

  • Reads all data into memory and displays it over a map

Warning

Limited by available system memory and Folium’s rendering capacity


3. Metadata Preservation (Work in Progress)

Operations like .isel() or .sel() can drop key metadata (e.g., file paths), breaking downstream functions like export() and plot(). This is the reason why libraries like rioxarray adds CRS in xarray coordinates, eg: spatial_ref

Planned Solution

A wrapper class (inspired by uxarray) that:

  • Subclasses or wraps xr.DataArray

  • Retains critical metadata across operations

  • Ensures export/visualization still works after filtering/slicing


🔮 Coming Soon

This alpha lays the foundation. Here’s what’s planned for upcoming releases:

  • 🧩 Additional Data Sources - HTTP(S), STAC APIs, Azure Blob Storage

  • 🗺️ Vector Mosaicking - Stitch vector data (with ability to export them to FlatGeobuf)

  • 🌡️ Climate Data Format Support - NetCDF, HDF5, and GRIB for atmospheric/oceanographic datasets

  • 🧼 Preprocessing Tools - Cloud masking and cloud treatment


❓ Open Design Questions

We welcome community feedback on:

  • What kind of interactive visualizations are important? (e.g., time slicing for an area of interest)

  • How should we retain metadata, usually saved in xarray attributes, across xarray operations?

  • Should metadata be saved alongside exports as sidecar files so that the entire time-series can be read without needing to open every file separately?


🧪 Try It Out

This Alpha release is ideal for researchers and developers working with cloud-native raster pipelines.


Have questions or suggestions? Feel free to raise an issue or join the discussions on GitHub.