API Reference#
Cosecha: Tools for harvesting earth observation data for use in flood forecasting.
GriddedReaper #
Bases: ReaperBase
Abstract base class for harvesting gridded data.
reap #
Fetch data from source and store it in instance state.
Returns:
-
DataFrame | Dataset–Harvested data from the source.
sow_to_zarr #
Write Dataset to Zarr store.
Parameters:
-
file_path(str | Path) –Local path or remote URI (e.g.
s3://bucket/store.zarr) where the Zarr store will be written. Parent directories are created automatically for local paths. -
consolidate(bool, default:True) –Whether to consolidate metadata after writing (default: True).
Returns:
-
str–The path or URI of the written Zarr store.
sow_to_netcdf #
sow_to_icechunk #
Write Dataset to IceChunk format.
Parameters:
-
storage_path(str | Path) –Local path or S3 URI (e.g.
s3://bucket/prefix) for the IceChunk storage. Local directories are created if needed; S3 credentials are read from the environment. -
group_path(str) –Path to the IceChunk group within the repository.
Returns:
-
str–The path or URI of the written IceChunk grouping.
MRMSReaper #
MRMSReaper(
dates,
variable="MultiSensor_QPE_01H_Pass2_00.00",
transformations=None,
cache_data=False,
)
Bases: GriddedReaper
Reaper for NOAA MRMS gridded precipitation data.
Initialize MRMSReaper.
Parameters:
-
dates(Literal['latest'] | tuple[str, str]) –"latest" to fetch the most recent available data, or a tuple of (start_time, end_time) to fetch a custom range, e.g. ("2026-01-01 00:00Z", "2026-01-01 18:00Z"). To fetch a single time point, set start_time and end_time to the same value, e.g. ("2026-01-01 00:00Z", "2026-01-01 00:00Z").
-
variable(str, default:'MultiSensor_QPE_01H_Pass2_00.00') –MRMS variable name.
-
transformations(dict[str, Any], default:None) –Optional transformations to apply to the raw data before returning.
-
cache_data(bool, default:False) –Whether to cache decompressed MRMS files on disk.
reap #
Fetch data from source and store it in instance state.
Returns:
-
DataFrame | Dataset–Harvested data from the source.
sow_to_zarr #
Write Dataset to Zarr store.
Parameters:
-
file_path(str | Path) –Local path or remote URI (e.g.
s3://bucket/store.zarr) where the Zarr store will be written. Parent directories are created automatically for local paths. -
consolidate(bool, default:True) –Whether to consolidate metadata after writing (default: True).
Returns:
-
str–The path or URI of the written Zarr store.
sow_to_netcdf #
sow_to_icechunk #
Write Dataset to IceChunk format.
Parameters:
-
storage_path(str | Path) –Local path or S3 URI (e.g.
s3://bucket/prefix) for the IceChunk storage. Local directories are created if needed; S3 credentials are read from the environment. -
group_path(str) –Path to the IceChunk group within the repository.
Returns:
-
str–The path or URI of the written IceChunk grouping.
NWPReaper #
NWPReaper(
init_time,
forecast_hours=None,
model="hrrr",
variable="hourly_precip",
search_str=None,
product=None,
transformations=None,
)
Bases: GriddedReaper
Fetch NOAA Numerical Weather Prediction (NWP) forecast data.
Initialize NWPReaper.
Parameters:
-
init_time(str) –Model initialization time in format "YYYY-MM-DD HH:MM" or similar. Parsed by
pandas.to_datetime(). Also accepts "latest" to automatically fetch the most recent initialization time for the specified model. -
forecast_hours(list[int] | range | None, default:None) –Forecast hours to request (e.g., [1, 6, 12] or range(1, 19)). Can be none if fetching analysis product.
-
model(str, default:'hrrr') –NWP model name (default: 'hrrr'). Other options: 'rrfs', 'rtma', etc.
-
variable(str | list[str] | None, default:'hourly_precip') –A simplified variable name (or list of names) mapping to predefined GRIB regex search strings. Common examples include 'hourly_precip', 'total_precip', 'temp_2m'.
-
search_str(str | list[str] | None, default:None) –Exact GRIB regex search string(s) to use. Can be combined with
variable. -
product(str | None, default:None) –Specific Herbie model product string.
-
transformations(dict[str, Any] | None, default:None) –Optional transformations to apply to the raw data before returning.
Raises:
-
ValueError–If init_time is invalid or forecast_hours are malformed.
-
ReaperError–If
variableis not recognized for the givenmodel, or neithervariablenorsearch_strare provided. -
ImportError–If herbie is not installed.
Examples:
>>> reaper = NWPReaper(
... init_time="2026-01-01 00:00",
... forecast_hours=range(1, 19),
... model="hrrr",
... variable="hourly_precip",
... transformations={
... "spatial_subset": {'lat_bounds': (40, 50), 'lon_bounds': (-90, -80)},
... "variable_rename": {"tp": "total_precipitation"},
... }
... )
reap #
Fetch data from source and store it in instance state.
Returns:
-
DataFrame | Dataset–Harvested data from the source.
sow_to_zarr #
Write Dataset to Zarr store.
Parameters:
-
file_path(str | Path) –Local path or remote URI (e.g.
s3://bucket/store.zarr) where the Zarr store will be written. Parent directories are created automatically for local paths. -
consolidate(bool, default:True) –Whether to consolidate metadata after writing (default: True).
Returns:
-
str–The path or URI of the written Zarr store.
sow_to_netcdf #
sow_to_icechunk #
Write Dataset to IceChunk format.
Parameters:
-
storage_path(str | Path) –Local path or S3 URI (e.g.
s3://bucket/prefix) for the IceChunk storage. Local directories are created if needed; S3 credentials are read from the environment. -
group_path(str) –Path to the IceChunk group within the repository.
Returns:
-
str–The path or URI of the written IceChunk grouping.
TimeSeriesReaper #
Bases: ReaperBase
Abstract base class for harvesting time-series data.
reap #
Fetch data from source and store it in instance state.
Returns:
-
DataFrame | Dataset–Harvested data from the source.
sow_to_parquet #
Write HarvestedData to Parquet format.
Parameters:
-
file_path(str | Path) –Local path or remote URI (e.g.
s3://bucket/key.parquet) where the Parquet file will be written. Parent directories are created automatically for local paths.
Raises:
-
ReaperError–If
reap()has not been called, or if data is not time-series.
Returns:
-
str–The path or URI of the written Parquet file.
sow_to_iceberg #
Write tabular data to Apache Iceberg format.
Parameters:
-
warehouse_path(str | Path) –Path to the Iceberg warehouse directory. Will be created if needed.
-
table_name(str) –Name of the Iceberg table to create or append to.
-
namespace(str, default:'default') –Namespace (database) for Iceberg tables (default: 'default').
-
catalog_name(str, default:'default') –Name of the PyIceberg catalog (default: 'default').
Returns:
-
str–The fully qualified table name (namespace.table_name).
USGSNWISReaper #
Bases: TimeSeriesReaper
Reaper for USGS NWIS instantaneous data.
Use the dataretrieval library to fetch data from USGS NWIS/Water Data APIs.
Parameters:
-
site_ids(list[str]) –List of USGS site IDs (e.g., ["01018035"]).
-
start_date(str) –Start date in ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SSZ).
-
end_date(str) –End date in ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SSZ).
-
parameter_code(str | list[str] | None, default:None) –USGS parameter code (e.g., "00060" for streamflow or "00045" for precipitation). If a list is provided, fetches data for all specified parameters. If None, fetches all available parameters.
-
transformations(dict[str, Any], default:None) –Optional transformations to apply to the data.
Examples:
>>> reaper = USGSNWISReaper(
... site_ids=["01018035"],
... start_date="2026-01-01",
... end_date="2026-01-31",
... parameter_code=["00060", "00045"],
... )
>>> data = reaper.reap()
reap #
Fetch data from source and store it in instance state.
Returns:
-
DataFrame | Dataset–Harvested data from the source.
sow_to_parquet #
Write HarvestedData to Parquet format.
Parameters:
-
file_path(str | Path) –Local path or remote URI (e.g.
s3://bucket/key.parquet) where the Parquet file will be written. Parent directories are created automatically for local paths.
Raises:
-
ReaperError–If
reap()has not been called, or if data is not time-series.
Returns:
-
str–The path or URI of the written Parquet file.
sow_to_iceberg #
Write tabular data to Apache Iceberg format.
Parameters:
-
warehouse_path(str | Path) –Path to the Iceberg warehouse directory. Will be created if needed.
-
table_name(str) –Name of the Iceberg table to create or append to.
-
namespace(str, default:'default') –Namespace (database) for Iceberg tables (default: 'default').
-
catalog_name(str, default:'default') –Name of the PyIceberg catalog (default: 'default').
Returns:
-
str–The fully qualified table name (namespace.table_name).
configure_logger #
configure_logger(
*,
verbose=None,
level=None,
file=None,
file_level=None,
file_mode="a",
file_only=False,
)
Configure logging settings.
Parameters:
-
verbose(bool, default:None) –Shortcut:
Truesets console to DEBUG,Falseto WARNING. If bothlevelandverboseare given,levelwins. -
level(str or int, default:None) –Console logging level (
"DEBUG","INFO","WARNING", etc.). -
file(str or Path, default:None) –Enable file logging at this path. Pass
Noneto disable file logging. -
file_level(str or int, default:None) –File handler level. Defaults to
DEBUG. -
file_mode(('a', 'w'), default:'a') –Append or overwrite the log file. Defaults to
'a'. -
file_only(bool, default:False) –If
True, disable console logging. Requiresfileto be set.