planetsca.train

This module contains functions to train Random Forest models to predict snow covered area (SCA) in Planet imagery.

train.data_training_new(labeled_polygons_filepath: str, training_image_filepath: str, training_data_filepath: str | None = None, rasterized_mask_output_filepath: str | None = None, ndvi: bool | None = False)[source]

Creates training data from scratch

Parameters:
  • labeled_polygons_filepath (str) – File path to shapefile or geojson file with labeled polygons

  • training_image_filepath (str) – File path to Planet Scope image

  • training_data_filepath (Optional[str]) – Optional: file path to output training data dataframe as a csv file (defaults to None)

  • rasterized_mask_output_filepath (Optional[str]) – Optional: file path to output the rasterized labeled polygons to a geotiff file (defaults to None)

  • ndvi (Optional[bool]) – Optional: Set to True to compute the Normalized Difference Vegetation Index (NDVI) and add to training data DataFrame

Returns:

training_data_df – pandas DataFrame of training data

Return type:

DataFrame

train.train_model(df_train: DataFrame, new_model_filepath: str, new_model_score_filepath: str, n_estimators: int = 10, max_depth: int = 10, max_features: int = 4, random_state: int | None = None, n_splits: int = 2, n_repeats: int = 2) RandomForestClassifier[source]

Trains and creates a new model with custom parameters

Parameters:
  • df_train (pd.DataFrame) – Dataframe containing training data, must have feature columns ‘blue’, ‘green’, ‘red’, ‘nir’ and target column ‘label’

  • new_model_filepath (str) – Filepath to save the model as a joblib file

  • new_model_score_filepath (str) – Filepath to save the model score information as a csv file

  • n_estimators (int) – Number of trees in the forest, defaults to 10

  • max_depth (int) – Maximum depth of the tree, defaults to 10

  • max_features (int) – Number of features to consider when looking for the best split, defaults to 4

  • random_state (int) – Seed to ensure reproducibility, defaults to None

  • n_splits (int) – Number of folds in the cross-validation, defaults to 2

  • n_repeats (int) – Number of times cross-validator needs to be repeated, defaults to 2

Returns:

model – The newly trained model

Return type:

RandomForestClassifier

train.vector_rasterize(labeled_polygons_filepath: str, training_image_filepath: str, rasterized_mask_output_filepath: str = None)[source]

Helper function for converting vector file to a raster file

Parameters:
  • labeled_polygons_filepath (str) – File path to shapefile or geojson file with labeled polygons

  • training_image_filepath (str) – File path to Planet Scope image

  • rasterized_mask_output_filepath (Optional[str]) – Optional: file path to output the rasterized labeled polygons to a geotiff file (defaults to None)

Returns:

rasterized – Rasterized version of the vector file

Return type:

np.array