EDK v0.1.2 Alpha Release ======================== Release Date: 2025-05-04 Version: Alpha (v0.1.2) Author: Earth Data Kit Team ---- ๐Ÿš€ What's New ------------- We're excited to announce the **first Alpha release of Earth Data Kit (EDK)** โ€” a unified toolkit making working with cloud-based GIS workflows easy. The goal of this alpha release is to get a sense of whether something like EDK will be useful for GIS analysis, while working toward creating a seamless experience for geospatial data processing in the cloud. This initial release delivers foundational features focused on working with **4D raster datasets** (``time``, ``band``, ``x``, ``y``), built for ease of use and compatibility with the Python geospatial ecosystem. Key Highlights ~~~~~~~~~~~~~~ - **Cloud-Native Mosaic Creation** Stitch rasters from cloud storage like AWS S3 or Google Earth Engine into time-based mosaics. - **Conversion to** ``xr.DataArray`` Native support for ``xarray`` and its ecosystem, including easy manipulation and analysis. - **Built-In Export and Visualization** Export to Cloud-Optimized GeoTIFFs and visualize with Folium maps directly from your DataArray. ---- ๐Ÿ”ง Core Features ---------------- 1. Cloud-Based Stitching API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create timestamped mosaics from raster tiles stored in supported cloud storage systems. - **discover()** Scans and filters datasets based on spatial and temporal bounds. Returns a catalog of relevant tiles. - **mosaic()** Generates Virtual Raster Tiles (VRTs) for each timestamp using the discovered tiles. - **save()** Saves discovery metadata locally for reuse without re-running the scanning process. .. note:: | โœ… Uses GDAL internally so most raster formats should be supported. Testing other formats is on the roadmap. | โœ… Tested with GeoTIFFs on S3 and Earth Engine datasets ---- 2. Integration with ``xr.DataArray`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EDK transforms stitched mosaics into ``xarray.DataArray`` objects, unlocking advanced processing and visualization with Python libraries. ``export()`` ^^^^^^^^^^^^ - Export 2D/3D/4D arrays to Cloud-Optimized GeoTIFFs (COGs) - Export destinations - Local disk - S3 buckets via ``s5cmd`` (local copy + upload) - Behavior: - 4D: one COG per ``time`` index (requires output folder) - 3D & 2D: requires explicit output file path ``plot()`` ^^^^^^^^^^ - Renders a ``DataArray`` as an interactive map using Folium's ``ImageOverlay`` - Reads all data into memory and displays it over a map .. warning:: Limited by available system memory and Folium's rendering capacity ---- 3. Metadata Preservation (Work in Progress) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Operations like ``.isel()`` or ``.sel()`` can drop key metadata (e.g., file paths), breaking downstream functions like ``export()`` and ``plot()``. This is the reason why libraries like rioxarray adds CRS in xarray coordinates, eg: spatial_ref Planned Solution ^^^^^^^^^^^^^^^^ A wrapper class (inspired by `uxarray `_) that: - Subclasses or wraps ``xr.DataArray`` - Retains critical metadata across operations - Ensures export/visualization still works after filtering/slicing ---- ๐Ÿ”ฎ Coming Soon -------------- This alpha lays the foundation. Here's what's planned for upcoming releases: - **๐Ÿงฉ Additional Data Sources** - HTTP(S), STAC APIs, Azure Blob Storage - **๐Ÿ—บ๏ธ Vector Mosaicking** - Stitch vector data (with ability to export them to FlatGeobuf) - **๐ŸŒก๏ธ Climate Data Format Support** - NetCDF, HDF5, and GRIB for atmospheric/oceanographic datasets - **๐Ÿงผ Preprocessing Tools** - Cloud masking and cloud treatment ---- โ“ Open Design Questions ------------------------ We welcome community feedback on: - What kind of interactive visualizations are important? (e.g., time slicing for an area of interest) - How should we retain metadata, usually saved in xarray attributes, across xarray operations? - Should metadata be **saved alongside exports** as sidecar files so that the entire time-series can be read without needing to open every file separately? ---- ๐Ÿงช Try It Out ------------- This Alpha release is ideal for researchers and developers working with cloud-native raster pipelines. - ๐Ÿ“ฆ **Installation:** *Coming soon via* ``pip``, see :doc:`/getting-started` for current installation instructions - ๐Ÿ› ๏ธ **GitHub:** ---- Have questions or suggestions? Feel free to raise an issue or join the discussions on `GitHub `_.