fire2a.knapsack
👋🌎 🌲🔥
MultiObjective Knapsack Rasters
Select the best set of pixels maximizing the sum of several rescaled and weighted rasters, minding capacity constraints.
Usage
Overview
- Choose your raster files (absolute path or relative to the script execution directory)
- Configure, for values: rescaling strategies (minmax, onehot, standard, robust or pass) and absolute weights (any real number)
- Configure, for capacites: capacity sense (lower or upper bound) and ratio (between -1 and 1)
Set output options (raster, authid, geotransform, plots, ...)
Command line execution
# get interface help python -m fire2a.knapsack --help # run (in a qgis/gdal aware python environment) python -m fire2a.knapsack [config.toml]
Script integration
from fire2a.knapasack import main
solution, model, instance, args = main(["--script","config.toml"])
Preparation
1. Choose your raster files
- Any GDAL compatible raster will be read
- Mind that any nodata value will exclude that pixel from the optimization (this can be overriden but not recommended, see
--exclude_nodata
, specially for weight constraining rasters) - A good practice is to place them all in the same directory where the script will be executed
- "Quote them" if they have any non alphanumerical chars [a-zA-Z0-9]
2. Preprocessing configuration
See the config.toml
file for example of the configuration of the preprocessing steps. The file is structured as follows:
["a_filename.tif"]
value_rescaling = "onehot"
value_weight = 0.5
["b_filename.tif"]
capacity_sense = "<="
capacity_ratio = 0.1
This example states the raster filename.tif
values will be rescaled using the OneHot
strategy, then multiplied by 0.5 in the sought objective; Also that at leat 10% of its weighted pixels must be selected.
__value_rescaling__
- can be minmax, onehot, standard, robust or pass for no rescaling.
- minmax (default) and onehot scale into [0,1], standard and robust not, mix them adjusting the value_weight
- MinMax: (x-min)/(max-min)
- Standard Scaler: (x-mean)/stddev
- Robust Scaler: same but droping the tails of the distribution
- OneHot Encoder: __for CATEGORICAL DATA__
__value_weight__
- can be any real number, although zero does not make sense
- positive maximizes, negative minimizes
__capacity_sense__
- can be >=, ≥, ge, geq, lb for lower bound, <=, ≤, le, leq, ub for upper bound
- default is upper bound
__capacity_ratio__
- can be any real number, between -1 and 1
- is proportional to the sum of all values of the pixels in the raster, meaning if all values are the same it represents the proportion of pixels to be selected
3. Other options
- __output_raster__ (default "")
- __authid__ (default "EPSG:3857")
- __geotransform__ (default "(0, 1, 0, 0, 0, -1)")
- __plots__ (default False) saves 3 .png files to the same output than the raster, showing the data, the scaled data and the weighted scaled data. Great for debugging but Can really slow down the process.
- __exclude_nodata__ (default "any") if "any" layer is nodata, it's excluded. It can be relaxed by setting "all" (layers must be nodata to be excluded) can cause great problems with the capacity rasters selecting pixels that weight 0.
- __script__ (default False) only for integrating into other scripts
- __no_write__ (default False)
- __verbose__ (default 0)
1#!/usr/bin/env python3 2# fmt: off 3"""👋🌎 🌲🔥 4# MultiObjective Knapsack Rasters 5Select the best set of pixels maximizing the sum of several rescaled and weighted rasters, minding capacity constraints. 6## Usage 7### Overview 81. Choose your raster files (absolute path or relative to the script execution directory) 92. Configure, for values: rescaling strategies (minmax, onehot, standard, robust or pass) and absolute weights (any real number) 103. Configure, for capacites: capacity sense (lower or upper bound) and ratio (between -1 and 1) 114. Set output options (raster, authid, geotransform, plots, ...) 12### Command line execution 13 ```bash 14 # get interface help 15 python -m fire2a.knapsack --help 16 17 # run (in a qgis/gdal aware python environment) 18 python -m fire2a.knapsack [config.toml] 19 ``` 20### Script integration 21 ```python 22 from fire2a.knapasack import main 23 solution, model, instance, args = main(["--script","config.toml"]) 24 ``` 25### Preparation 26#### 1. Choose your raster files 27- Any [GDAL compatible](https://gdal.org/en/latest/drivers/raster/index.html) raster will be read 28- Mind that any nodata value will exclude that pixel from the optimization (this can be overriden but not recommended, see `--exclude_nodata`, specially for weight constraining rasters) 29- A good practice is to place them all in the same directory where the script will be executed 30- "Quote them" if they have any non alphanumerical chars [a-zA-Z0-9] 31 32#### 2. Preprocessing configuration 33See the `config.toml` file for example of the configuration of the preprocessing steps. The file is structured as follows: 34 35```toml 36["a_filename.tif"] 37value_rescaling = "onehot" 38value_weight = 0.5 39 40["b_filename.tif"] 41capacity_sense = "<=" 42capacity_ratio = 0.1 43``` 44This example states the raster `filename.tif` values will be rescaled using the `OneHot` strategy, then multiplied by 0.5 in the sought objective; Also that at leat 10% of its weighted pixels must be selected. 45 461. __value_rescaling__ 47 - can be minmax, onehot, standard, robust or pass for no rescaling. 48 - minmax (default) and onehot scale into [0,1], standard and robust not, mix them adjusting the value_weight 49 - [MinMax](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html): (x-min)/(max-min) 50 - [Standard Scaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html): (x-mean)/stddev 51 - [Robust Scaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html): same but droping the tails of the distribution 52 - [OneHot Encoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html): __for CATEGORICAL DATA__ 53 542. __value_weight__ 55 - can be any real number, although zero does not make sense 56 - positive maximizes, negative minimizes 57 583. __capacity_sense__ 59 - can be >=, ≥, ge, geq, lb for lower bound, <=, ≤, le, leq, ub for upper bound 60 - default is upper bound 61 624. __capacity_ratio__ 63 - can be any real number, between -1 and 1 64 - is proportional to the sum of all values of the pixels in the raster, meaning if all values are the same it represents the proportion of pixels to be selected 65 66#### 3. Other options 67- __output_raster__ (default "") 68- __authid__ (default "EPSG:3857") 69- __geotransform__ (default "(0, 1, 0, 0, 0, -1)") 70- __plots__ (default False) saves 3 .png files to the same output than the raster, showing the data, the scaled data and the weighted scaled data. Great for debugging but Can really slow down the process. 71- __exclude_nodata__ (default "any") if "any" layer is nodata, it's excluded. It can be relaxed by setting "all" (layers must be nodata to be excluded) can cause great problems with the capacity rasters selecting pixels that weight 0. 72- __script__ (default False) only for integrating into other scripts 73- __no_write__ (default False) 74- __verbose__ (default 0) 75 76""" 77# fmt: on 78import logging 79import sys 80from pathlib import Path 81 82import numpy as np 83from matplotlib import pyplot as plt 84from pyomo import environ as pyo 85 86from fire2a.utils import read_toml 87 88logger = logging.getLogger(__name__) 89 90allowed_ub = ["<=", "≤", "le", "leq", "ub"] 91allowed_lb = [">=", "≥", "ge", "geq", "lb"] 92config_allowed = { 93 "value_rescaling": ["minmax", "standard", "robust", "onehot", "pass"], 94 "capacity_sense": allowed_ub + allowed_lb, 95} 96 97 98def check_shapes(data_list): 99 """Check if all data arrays have the same shape and are 2D. 100 Returns the shape of the data arrays if they are all equal. 101 """ 102 from functools import reduce 103 104 def equal_or_error(x, y): 105 """Check if x and y are equal, returns x if equal else raises a ValueError.""" 106 if x == y: 107 return x 108 else: 109 raise ValueError("All data arrays must have the same shape") 110 111 shape = reduce(equal_or_error, (data.shape for data in data_list)) 112 if len(shape) != 2: 113 raise ValueError("All data arrays must be 2D") 114 height, width = shape 115 return height, width 116 117 118def pipelie(observations, config): 119 """Create a pipeline for the observations and the configuration.""" 120 from sklearn.compose import ColumnTransformer 121 from sklearn.pipeline import Pipeline 122 from sklearn.preprocessing import MinMaxScaler, OneHotEncoder, RobustScaler, StandardScaler 123 124 # 1. SCALING 125 scalers = { 126 "minmax": MinMaxScaler(), 127 "standard": StandardScaler(), 128 "robust": RobustScaler(), 129 "onehot": OneHotEncoder(), 130 "passthrough": "passthrough", 131 } 132 133 # 2. PIPELINE 134 pipe = Pipeline( 135 [ 136 ( 137 "scaler", 138 ColumnTransformer( 139 [ 140 (item["name"], scalers.get(item.get("value_rescaling")), [i]) 141 for i, item in enumerate(config) 142 if item.get("value_rescaling") 143 ], 144 remainder="drop", 145 ), 146 ) 147 ], 148 verbose=True, 149 ) 150 151 # 3. FIT 152 scaled = pipe.fit_transform(observations) 153 feat_names = pipe.named_steps["scaler"].get_feature_names_out() 154 logger.debug("Pipeline input: %s", [{itm.get("name"): itm.get("value_rescaling")} for itm in config]) 155 logger.debug("Pipeline output: feat_names:%s", feat_names) 156 logger.debug("Pipeline output: scaled.shape:%s", scaled.shape) 157 return scaled, pipe, feat_names 158 159 160def arg_parser(argv=None): 161 """Parse command line arguments.""" 162 from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser 163 164 parser = ArgumentParser( 165 description="MultiObjective Knapsack Rasters", 166 formatter_class=ArgumentDefaultsHelpFormatter, 167 epilog="Full documentation at https://fire2a.github.io/fire2a-lib/fire2a/knapsack.html", 168 ) 169 parser.add_argument( 170 "config_file", 171 nargs="?", 172 type=Path, 173 help="A toml formatted file, with a section header for each raster [file_name], with items: 'value_rescaling', 'value_weight', 'capacity_sense' and 'capacity_ratio'", 174 default="config.toml", 175 ) 176 parser.add_argument("-or", "--output_raster", help="Output raster file, warning overwrites!", default="") 177 parser.add_argument( 178 "-e", 179 "--exclude_nodata", 180 type=str, 181 help="By default if 'any' layer is nodata, it's excluded. It can be relaxed by setting 'all' (layers must be nodata to be excluded)", 182 choices=["any", "all"], 183 default="any", 184 ) 185 parser.add_argument("-a", "--authid", type=str, help="Output raster authid", default="EPSG:3857") 186 parser.add_argument( 187 "-g", "--geotransform", type=str, help="Output raster geotransform", default="(0, 1, 0, 0, 0, -1)" 188 ) 189 parser.add_argument( 190 "-nw", 191 "--no_write", 192 action="store_true", 193 help="Do not write outputs raster nor polygons", 194 default=False, 195 ) 196 parser.add_argument( 197 "-s", 198 "--script", 199 action="store_true", 200 help="Run in script mode, returning the label_map and the pipeline object", 201 default=False, 202 ) 203 parser.add_argument("--verbose", "-v", action="count", default=0, help="WARNING:1, INFO:2, DEBUG:3") 204 parser.add_argument( 205 "-p", 206 "--plots", 207 action="store_true", 208 help="Activate the plotting routines (saves 3 .png files to the same output than the raster)", 209 default=False, 210 ) 211 args = parser.parse_args(argv) 212 args.geotransform = tuple(map(float, args.geotransform[1:-1].split(","))) 213 if Path(args.config_file).is_file() is False: 214 parser.error(f"File {args.config_file} not found") 215 return args 216 217 218def aplot(data: np.ndarray, title: str, series_names: list[str], outpath: Path, show=False): # __name__ == "__main__" 219 """ 220 names = [itm["name"] for i, itm in enumerate(config)] 221 outpath = Path(args.output_raster).parent 222 fname = outpath / "observations.png" 223 """ 224 if not isinstance(data, np.ndarray): 225 data = data.toarray() 226 if not isinstance(data[0], np.ndarray): 227 for itm in data: 228 itm = itm.toarray() 229 fig, ax = plt.subplots(1, 2, figsize=(10, 5)) 230 fig.suptitle(title) 231 ax[0].violinplot(data, showmeans=False, showmedians=True, showextrema=True) 232 ax[0].set_title("violinplot") 233 ax[0].set_xticks(range(1, len(series_names) + 1), series_names) 234 ax[1].boxplot(data) 235 ax[1].set_title("boxplot") 236 ax[1].set_xticks(range(1, len(series_names) + 1), series_names) 237 if show: 238 plt.show() 239 fname = outpath / (title + ".png") 240 plt.savefig(fname) 241 plt.close() 242 243 244def get_model(scaled=None, values_weights=None, cap_cfg=None, cap_data=None, **kwargs): 245 246 # m model 247 m = pyo.ConcreteModel("MultiObjectiveKnapsack") 248 # sets 249 m.V = pyo.RangeSet(0, scaled.shape[1] - 1) 250 scaled_n, scaled_v = scaled.nonzero() 251 m.NV = pyo.Set(initialize=[(n, v) for n, v in zip(scaled_n, scaled_v)]) 252 m.W = pyo.RangeSet(0, len(cap_cfg) - 1) 253 cap_data_n, cap_data_v = cap_data.nonzero() 254 m.NW = pyo.Set(initialize=[(n, w) for n, w in zip(cap_data_n, cap_data_v)]) 255 both_nv_nw = list(set(scaled_n) | set(cap_data_n)) 256 both_nv_nw.sort() 257 m.N = pyo.Set(initialize=both_nv_nw) 258 # parameters 259 m.scaled_values = pyo.Param(m.NV, initialize=lambda m, n, v: scaled[n, v]) 260 m.values_weight = pyo.Param(m.V, initialize=values_weights) 261 m.total_capacity = pyo.Param(m.W, initialize=[itm["cap"] for itm in cap_cfg]) 262 m.capacity_weight = pyo.Param(m.NW, initialize=lambda m, n, w: cap_data[n, w]) 263 # variables 264 m.X = pyo.Var(m.N, within=pyo.Binary) 265 266 # constraints 267 def capacity_rule(m, w): 268 if cap_cfg[w]["sense"] in allowed_ub: 269 return sum(m.X[n] * m.capacity_weight[n, w] for n, ww in m.NW if ww == w) <= m.total_capacity[w] 270 elif cap_cfg[w]["sense"] in allowed_lb: 271 return sum(m.X[n] * m.capacity_weight[n, w] for n, ww in m.NW if ww == w) >= m.total_capacity[w] 272 else: 273 logger.critical("Skipping capacity constraint %s, %s", w, cap_cfg[w]) 274 return pyo.Constraint.Skip 275 276 m.capacity = pyo.Constraint(m.W, rule=capacity_rule) 277 # objective 278 m.obj = pyo.Objective( 279 expr=sum(m.values_weight[v] * sum(m.X[n] * m.scaled_values[n, v] for n, vv in m.NV if vv == v) for v in m.V), 280 sense=pyo.maximize, 281 ) 282 return m 283 284 285def solve_pyomo(m, tee=True, solver="cplex", **kwargs): 286 from pyomo.opt import SolverFactory 287 288 opt = SolverFactory(solver) 289 results = opt.solve(m, tee=tee, **kwargs) 290 return opt, results 291 292 293def pre_solve(argv): 294 if argv is sys.argv: 295 argv = sys.argv[1:] 296 args = arg_parser(argv) 297 298 if args.verbose != 0: 299 global logger 300 from fire2a import setup_logger 301 302 logger = setup_logger(verbosity=args.verbose) 303 304 logger.info("args %s", args) 305 306 # 2 LEE CONFIG 307 config = read_toml(args.config_file) 308 # dict -> list[dict] 309 a, b = list(config.keys()), list(config.values()) 310 config = [{"name": Path(a).name, "filename": Path(a), **b} for a, b in zip(a, b)] 311 for itm in config: 312 if "value_rescaling" in itm: 313 itm["value_rescaling"] = itm["value_rescaling"].lower() 314 if itm["value_rescaling"] not in config_allowed["value_rescaling"]: 315 logger.critical("Wrong value for value_rescaling in %s", itm) 316 sys.exit(1) 317 if "capacity_sense" in itm: 318 itm["capacity_sense"] = itm["capacity_sense"].lower() 319 if itm["capacity_sense"] not in config_allowed["capacity_sense"]: 320 logger.critical("Wrong value for capacity_sense in %s", itm) 321 sys.exit(1) 322 logger.debug(itm) 323 324 # 2.1 CHECK PAIRS + defaults 325 # vr : value_rescaling 326 # vw : value_weight 327 # cr : capacity_ratio 328 # cs : capacity_sense 329 for itm in config: 330 if vr := itm.get("value_rescaling"): 331 # vr & !vw => vw = 1 332 if "value_weight" not in itm: 333 logger.warning( 334 "value_rescaling without value_weight for item: %s\n ASSUMING VALUE WEIGHT IS 1", itm["name"] 335 ) 336 itm["value_weight"] = 1 337 if vr == "pass": 338 itm["value_weight"] = "passthrough" 339 # !vr & vw => vr = passthrough 340 elif "value_weight" in itm: 341 logger.warning("value_weight without value_rescaling for item: %s\n DEFAULTING TO MINMAX", itm["name"]) 342 itm["value_rescaling"] = "minmax" 343 if cr := itm.get("capacity_ratio"): 344 # cr not in (-1,1) =>!<= 345 if not (-1 < cr < 1): 346 logger.critical("Wrong value for capacity_ratio in %s, should be (-1,1)", itm) 347 sys.exit(1) 348 # cr & !cs => cs = ub 349 if "capacity_sense" not in itm: 350 logger.warning( 351 "capacity_ratio without capacity_sense for item: %s\n ASSUMING SENSE IS UPPER BOUND", itm["name"] 352 ) 353 itm["capacity_sense"] = "ub" 354 # !cr & cs =>!<= 355 elif "capacity_sense" in itm: 356 logger.critical("capacity_sense without capacity_ratio for item: %s", itm["name"]) 357 sys.exit(1) 358 359 # 3. LEE DATA 360 from fire2a.raster import read_raster 361 362 data_list = [] 363 for item in config: 364 data, info = read_raster(str(item["filename"])) 365 item.update(info) 366 data_list += [data] 367 368 # 4. VALIDAR 2d todos mismo shape 369 height, width = check_shapes(data_list) 370 371 # 5. lista[mapas] -> OBSERVACIONES 372 all_observations = np.column_stack([data.ravel() for data in data_list]) 373 374 # 6. if all|any rasters are nodata then mask out 375 nodatas = [item["NoDataValue"] for item in config] 376 if args.exclude_nodata == "any": 377 nodata_mask = np.any(all_observations == nodatas, axis=1) 378 if args.exclude_nodata == "all": 379 nodata_mask = np.all(all_observations == nodatas, axis=1) 380 381 logger.info("Sum of all rasters NoData: %s pixels", nodata_mask.sum()) 382 observations = all_observations[~nodata_mask] 383 384 # 7. nodata -> 0 385 if args.exclude_nodata == "all": 386 for col, nd in zip(observations.T, nodatas): 387 col[col == nd] = 0 388 389 if args.plots: 390 aplot( 391 observations, "observations", [itm["name"] for itm in config], Path(args.output_raster).parent, show=False 392 ) 393 394 # scaling 395 # 8. PIPELINE 396 scaled, pipe, feat_names = pipelie(observations, config) 397 # assert observations.shape[0] == scaled.shape[0] 398 # assert observations.shape[1] >= scaled.shape[1] 399 logger.info(f"{observations.shape=}") 400 logger.info(f"{scaled.shape=}") 401 402 if args.plots: 403 # exclude multiple ocurrences of onehots 404 first_occurrences = {} 405 for i, fn in enumerate(feat_names): 406 for bn in [itm["name"] for itm in config]: 407 if fn.startswith(bn) and bn not in first_occurrences: 408 first_occurrences[bn] = i 409 break 410 idxs = list(first_occurrences.values()) 411 aplot(scaled[:, idxs], "scaled", feat_names[idxs], Path(args.output_raster).parent, show=False) 412 413 # weights 414 values_weights = [] 415 for name in feat_names: 416 for item in config: 417 if name.startswith(item["name"]): 418 values_weights += [item["value_weight"]] 419 values_weights = np.array(values_weights) 420 logger.info(f"{values_weights.shape=}") 421 422 if args.plots: 423 from scipy.sparse import csr_matrix 424 425 values_weights_diag = csr_matrix(np.diag(values_weights[idxs])) 426 scaled_weighted = scaled[:, idxs].dot(values_weights_diag) 427 428 aplot( 429 scaled_weighted, 430 "scaled_weighted", 431 feat_names[idxs], 432 Path(args.output_raster).parent, 433 show=False, 434 ) 435 436 # capacities 437 # "name": item["filename"].name.replace('.','_'), 438 cap_cfg = [ 439 { 440 "idx": i, 441 "name": item["filename"].stem, 442 "cap": observations[:, i].sum() * item["capacity_ratio"], 443 "sense": item["capacity_sense"], 444 } 445 for i, item in enumerate(config) 446 if "capacity_ratio" in item 447 ] 448 cap_data = observations[:, [itm["idx"] for itm in cap_cfg]] 449 instance = { 450 "scaled": scaled, 451 "values_weights": values_weights, 452 "cap_cfg": cap_cfg, 453 "cap_data": cap_data, 454 "feat_names": feat_names, 455 "height": height, 456 "width": width, 457 "nodata_mask": nodata_mask, 458 "all_observations": all_observations, 459 "observations": observations, 460 "pipe": pipe, 461 "nodatas": nodatas, 462 } 463 return instance, args 464 465 466def main(argv=None): 467 """This main is split in 3 parts with the objective of being called from within QGIS fire2a's toolbox-plugin. 468 Nevertheless, it can be called from the command line: 469 ```bash 470 python -m fire2a.knapsack [config.toml] 471 ``` 472 Or integrated into other scripts. 473 from fire2a.knapasack import main 474 ```python 475 soln, m, instance, args = main(["config.toml"]) 476 ``` 477 """ 478 # 0..9 PRE 479 instance, args = pre_solve(argv) 480 481 # 9. PYOMO MODEL 482 m = get_model(**instance) 483 484 # 10. PYOMO SOLVE 485 opt, results = solve_pyomo(m, tee=True, solver="cplex") 486 instance["opt"] = opt 487 instance["results"] = results 488 489 # 11. POST 490 return post_solve(m, args=args, **instance) 491 492 493def post_solve( 494 m, 495 args=None, 496 scaled=None, 497 values_weights=None, 498 cap_cfg=None, 499 cap_data=None, 500 feat_names=None, 501 height=None, 502 width=None, 503 nodata_mask=None, 504 all_observations=None, 505 observations=None, 506 pipe=None, 507 nodatas=None, 508 **kwargs, 509): 510 soln = np.array([pyo.value(m.X[i], exception=False) for i in m.X], dtype=np.float32) 511 logger.info("solution pseudo-histogram: %s", np.unique(soln, return_counts=True)) 512 soln[~soln.astype(bool)] = 0 513 514 try: 515 slacks = m.capacity[:].slack() 516 logger.info("objective: %s", m.obj(exception=False)) 517 except Exception as e: 518 logger.error(e) 519 slacks = [0] * len(cap_cfg) 520 521 if not isinstance(scaled, np.ndarray): 522 scaled = scaled.toarray() 523 vx = np.matmul(scaled.T, soln) 524 525 logger.info("Values per objective:") 526 for f, v in zip(feat_names, vx): 527 logger.info(f"{f}\t\t{v:.4f}") 528 529 if args.plots: 530 fig, ax = plt.subplots(1, 2, figsize=(10, 5)) 531 fig.suptitle("solution") 532 ax[0].set_title("positive objectives, abs(w*v*x)") 533 ax[0].pie(np.absolute(vx * values_weights), labels=feat_names, autopct="%1.1f%%") 534 535 cap_ratio = [slacks[i] / itm["cap"] for i, itm in enumerate(cap_cfg)] 536 ax[1].set_title("capacity slack ratios") 537 ax[1].bar([itm["name"] for itm in cap_cfg], cap_ratio) 538 539 # if __name__ == "__main__": 540 # plt.show() 541 plt.savefig(Path(args.output_raster).parent / "solution.png") 542 plt.close() 543 544 logger.info("Capacity slack:") 545 for i, itm in enumerate(cap_cfg): 546 logger.info(f"{i}: name:{itm['name']} cap:{itm['cap']} sense:{itm['sense']} slack:{slacks[i]}") 547 548 if args.script: 549 instance = { 550 "scaled": scaled, 551 "values_weights": values_weights, 552 "cap_cfg": cap_cfg, 553 "cap_data": cap_data, 554 "feat_names": feat_names, 555 "height": height, 556 "width": width, 557 "nodata_mask": nodata_mask, 558 "all_observations": all_observations, 559 "observations": observations, 560 "pipe": pipe, 561 "nodatas": nodatas, 562 } 563 return soln, m, instance, args 564 else: 565 # 10. WRITE OUTPUTS 566 from fire2a.raster import write_raster 567 568 if args.output_raster: 569 # put soln into the original shape of all_observations using the reversal of nodata_mask 570 data = np.zeros(height * width, dtype=np.float32) - 9999 571 data[~nodata_mask] = soln 572 data = data.reshape(height, width) 573 if not write_raster( 574 data, 575 outfile=str(args.output_raster), 576 nodata=-9999, 577 authid=args.authid, 578 geotransform=args.geotransform, 579 logger=logger, 580 ): 581 logger.error("Error writing output raster") 582 return 1 583 return 0 584 585 586if __name__ == "__main__": 587 sys.exit(main(sys.argv))
99def check_shapes(data_list): 100 """Check if all data arrays have the same shape and are 2D. 101 Returns the shape of the data arrays if they are all equal. 102 """ 103 from functools import reduce 104 105 def equal_or_error(x, y): 106 """Check if x and y are equal, returns x if equal else raises a ValueError.""" 107 if x == y: 108 return x 109 else: 110 raise ValueError("All data arrays must have the same shape") 111 112 shape = reduce(equal_or_error, (data.shape for data in data_list)) 113 if len(shape) != 2: 114 raise ValueError("All data arrays must be 2D") 115 height, width = shape 116 return height, width
Check if all data arrays have the same shape and are 2D. Returns the shape of the data arrays if they are all equal.
119def pipelie(observations, config): 120 """Create a pipeline for the observations and the configuration.""" 121 from sklearn.compose import ColumnTransformer 122 from sklearn.pipeline import Pipeline 123 from sklearn.preprocessing import MinMaxScaler, OneHotEncoder, RobustScaler, StandardScaler 124 125 # 1. SCALING 126 scalers = { 127 "minmax": MinMaxScaler(), 128 "standard": StandardScaler(), 129 "robust": RobustScaler(), 130 "onehot": OneHotEncoder(), 131 "passthrough": "passthrough", 132 } 133 134 # 2. PIPELINE 135 pipe = Pipeline( 136 [ 137 ( 138 "scaler", 139 ColumnTransformer( 140 [ 141 (item["name"], scalers.get(item.get("value_rescaling")), [i]) 142 for i, item in enumerate(config) 143 if item.get("value_rescaling") 144 ], 145 remainder="drop", 146 ), 147 ) 148 ], 149 verbose=True, 150 ) 151 152 # 3. FIT 153 scaled = pipe.fit_transform(observations) 154 feat_names = pipe.named_steps["scaler"].get_feature_names_out() 155 logger.debug("Pipeline input: %s", [{itm.get("name"): itm.get("value_rescaling")} for itm in config]) 156 logger.debug("Pipeline output: feat_names:%s", feat_names) 157 logger.debug("Pipeline output: scaled.shape:%s", scaled.shape) 158 return scaled, pipe, feat_names
Create a pipeline for the observations and the configuration.
161def arg_parser(argv=None): 162 """Parse command line arguments.""" 163 from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser 164 165 parser = ArgumentParser( 166 description="MultiObjective Knapsack Rasters", 167 formatter_class=ArgumentDefaultsHelpFormatter, 168 epilog="Full documentation at https://fire2a.github.io/fire2a-lib/fire2a/knapsack.html", 169 ) 170 parser.add_argument( 171 "config_file", 172 nargs="?", 173 type=Path, 174 help="A toml formatted file, with a section header for each raster [file_name], with items: 'value_rescaling', 'value_weight', 'capacity_sense' and 'capacity_ratio'", 175 default="config.toml", 176 ) 177 parser.add_argument("-or", "--output_raster", help="Output raster file, warning overwrites!", default="") 178 parser.add_argument( 179 "-e", 180 "--exclude_nodata", 181 type=str, 182 help="By default if 'any' layer is nodata, it's excluded. It can be relaxed by setting 'all' (layers must be nodata to be excluded)", 183 choices=["any", "all"], 184 default="any", 185 ) 186 parser.add_argument("-a", "--authid", type=str, help="Output raster authid", default="EPSG:3857") 187 parser.add_argument( 188 "-g", "--geotransform", type=str, help="Output raster geotransform", default="(0, 1, 0, 0, 0, -1)" 189 ) 190 parser.add_argument( 191 "-nw", 192 "--no_write", 193 action="store_true", 194 help="Do not write outputs raster nor polygons", 195 default=False, 196 ) 197 parser.add_argument( 198 "-s", 199 "--script", 200 action="store_true", 201 help="Run in script mode, returning the label_map and the pipeline object", 202 default=False, 203 ) 204 parser.add_argument("--verbose", "-v", action="count", default=0, help="WARNING:1, INFO:2, DEBUG:3") 205 parser.add_argument( 206 "-p", 207 "--plots", 208 action="store_true", 209 help="Activate the plotting routines (saves 3 .png files to the same output than the raster)", 210 default=False, 211 ) 212 args = parser.parse_args(argv) 213 args.geotransform = tuple(map(float, args.geotransform[1:-1].split(","))) 214 if Path(args.config_file).is_file() is False: 215 parser.error(f"File {args.config_file} not found") 216 return args
Parse command line arguments.
219def aplot(data: np.ndarray, title: str, series_names: list[str], outpath: Path, show=False): # __name__ == "__main__" 220 """ 221 names = [itm["name"] for i, itm in enumerate(config)] 222 outpath = Path(args.output_raster).parent 223 fname = outpath / "observations.png" 224 """ 225 if not isinstance(data, np.ndarray): 226 data = data.toarray() 227 if not isinstance(data[0], np.ndarray): 228 for itm in data: 229 itm = itm.toarray() 230 fig, ax = plt.subplots(1, 2, figsize=(10, 5)) 231 fig.suptitle(title) 232 ax[0].violinplot(data, showmeans=False, showmedians=True, showextrema=True) 233 ax[0].set_title("violinplot") 234 ax[0].set_xticks(range(1, len(series_names) + 1), series_names) 235 ax[1].boxplot(data) 236 ax[1].set_title("boxplot") 237 ax[1].set_xticks(range(1, len(series_names) + 1), series_names) 238 if show: 239 plt.show() 240 fname = outpath / (title + ".png") 241 plt.savefig(fname) 242 plt.close()
names = [itm["name"] for i, itm in enumerate(config)] outpath = Path(args.output_raster).parent fname = outpath / "observations.png"
245def get_model(scaled=None, values_weights=None, cap_cfg=None, cap_data=None, **kwargs): 246 247 # m model 248 m = pyo.ConcreteModel("MultiObjectiveKnapsack") 249 # sets 250 m.V = pyo.RangeSet(0, scaled.shape[1] - 1) 251 scaled_n, scaled_v = scaled.nonzero() 252 m.NV = pyo.Set(initialize=[(n, v) for n, v in zip(scaled_n, scaled_v)]) 253 m.W = pyo.RangeSet(0, len(cap_cfg) - 1) 254 cap_data_n, cap_data_v = cap_data.nonzero() 255 m.NW = pyo.Set(initialize=[(n, w) for n, w in zip(cap_data_n, cap_data_v)]) 256 both_nv_nw = list(set(scaled_n) | set(cap_data_n)) 257 both_nv_nw.sort() 258 m.N = pyo.Set(initialize=both_nv_nw) 259 # parameters 260 m.scaled_values = pyo.Param(m.NV, initialize=lambda m, n, v: scaled[n, v]) 261 m.values_weight = pyo.Param(m.V, initialize=values_weights) 262 m.total_capacity = pyo.Param(m.W, initialize=[itm["cap"] for itm in cap_cfg]) 263 m.capacity_weight = pyo.Param(m.NW, initialize=lambda m, n, w: cap_data[n, w]) 264 # variables 265 m.X = pyo.Var(m.N, within=pyo.Binary) 266 267 # constraints 268 def capacity_rule(m, w): 269 if cap_cfg[w]["sense"] in allowed_ub: 270 return sum(m.X[n] * m.capacity_weight[n, w] for n, ww in m.NW if ww == w) <= m.total_capacity[w] 271 elif cap_cfg[w]["sense"] in allowed_lb: 272 return sum(m.X[n] * m.capacity_weight[n, w] for n, ww in m.NW if ww == w) >= m.total_capacity[w] 273 else: 274 logger.critical("Skipping capacity constraint %s, %s", w, cap_cfg[w]) 275 return pyo.Constraint.Skip 276 277 m.capacity = pyo.Constraint(m.W, rule=capacity_rule) 278 # objective 279 m.obj = pyo.Objective( 280 expr=sum(m.values_weight[v] * sum(m.X[n] * m.scaled_values[n, v] for n, vv in m.NV if vv == v) for v in m.V), 281 sense=pyo.maximize, 282 ) 283 return m
294def pre_solve(argv): 295 if argv is sys.argv: 296 argv = sys.argv[1:] 297 args = arg_parser(argv) 298 299 if args.verbose != 0: 300 global logger 301 from fire2a import setup_logger 302 303 logger = setup_logger(verbosity=args.verbose) 304 305 logger.info("args %s", args) 306 307 # 2 LEE CONFIG 308 config = read_toml(args.config_file) 309 # dict -> list[dict] 310 a, b = list(config.keys()), list(config.values()) 311 config = [{"name": Path(a).name, "filename": Path(a), **b} for a, b in zip(a, b)] 312 for itm in config: 313 if "value_rescaling" in itm: 314 itm["value_rescaling"] = itm["value_rescaling"].lower() 315 if itm["value_rescaling"] not in config_allowed["value_rescaling"]: 316 logger.critical("Wrong value for value_rescaling in %s", itm) 317 sys.exit(1) 318 if "capacity_sense" in itm: 319 itm["capacity_sense"] = itm["capacity_sense"].lower() 320 if itm["capacity_sense"] not in config_allowed["capacity_sense"]: 321 logger.critical("Wrong value for capacity_sense in %s", itm) 322 sys.exit(1) 323 logger.debug(itm) 324 325 # 2.1 CHECK PAIRS + defaults 326 # vr : value_rescaling 327 # vw : value_weight 328 # cr : capacity_ratio 329 # cs : capacity_sense 330 for itm in config: 331 if vr := itm.get("value_rescaling"): 332 # vr & !vw => vw = 1 333 if "value_weight" not in itm: 334 logger.warning( 335 "value_rescaling without value_weight for item: %s\n ASSUMING VALUE WEIGHT IS 1", itm["name"] 336 ) 337 itm["value_weight"] = 1 338 if vr == "pass": 339 itm["value_weight"] = "passthrough" 340 # !vr & vw => vr = passthrough 341 elif "value_weight" in itm: 342 logger.warning("value_weight without value_rescaling for item: %s\n DEFAULTING TO MINMAX", itm["name"]) 343 itm["value_rescaling"] = "minmax" 344 if cr := itm.get("capacity_ratio"): 345 # cr not in (-1,1) =>!<= 346 if not (-1 < cr < 1): 347 logger.critical("Wrong value for capacity_ratio in %s, should be (-1,1)", itm) 348 sys.exit(1) 349 # cr & !cs => cs = ub 350 if "capacity_sense" not in itm: 351 logger.warning( 352 "capacity_ratio without capacity_sense for item: %s\n ASSUMING SENSE IS UPPER BOUND", itm["name"] 353 ) 354 itm["capacity_sense"] = "ub" 355 # !cr & cs =>!<= 356 elif "capacity_sense" in itm: 357 logger.critical("capacity_sense without capacity_ratio for item: %s", itm["name"]) 358 sys.exit(1) 359 360 # 3. LEE DATA 361 from fire2a.raster import read_raster 362 363 data_list = [] 364 for item in config: 365 data, info = read_raster(str(item["filename"])) 366 item.update(info) 367 data_list += [data] 368 369 # 4. VALIDAR 2d todos mismo shape 370 height, width = check_shapes(data_list) 371 372 # 5. lista[mapas] -> OBSERVACIONES 373 all_observations = np.column_stack([data.ravel() for data in data_list]) 374 375 # 6. if all|any rasters are nodata then mask out 376 nodatas = [item["NoDataValue"] for item in config] 377 if args.exclude_nodata == "any": 378 nodata_mask = np.any(all_observations == nodatas, axis=1) 379 if args.exclude_nodata == "all": 380 nodata_mask = np.all(all_observations == nodatas, axis=1) 381 382 logger.info("Sum of all rasters NoData: %s pixels", nodata_mask.sum()) 383 observations = all_observations[~nodata_mask] 384 385 # 7. nodata -> 0 386 if args.exclude_nodata == "all": 387 for col, nd in zip(observations.T, nodatas): 388 col[col == nd] = 0 389 390 if args.plots: 391 aplot( 392 observations, "observations", [itm["name"] for itm in config], Path(args.output_raster).parent, show=False 393 ) 394 395 # scaling 396 # 8. PIPELINE 397 scaled, pipe, feat_names = pipelie(observations, config) 398 # assert observations.shape[0] == scaled.shape[0] 399 # assert observations.shape[1] >= scaled.shape[1] 400 logger.info(f"{observations.shape=}") 401 logger.info(f"{scaled.shape=}") 402 403 if args.plots: 404 # exclude multiple ocurrences of onehots 405 first_occurrences = {} 406 for i, fn in enumerate(feat_names): 407 for bn in [itm["name"] for itm in config]: 408 if fn.startswith(bn) and bn not in first_occurrences: 409 first_occurrences[bn] = i 410 break 411 idxs = list(first_occurrences.values()) 412 aplot(scaled[:, idxs], "scaled", feat_names[idxs], Path(args.output_raster).parent, show=False) 413 414 # weights 415 values_weights = [] 416 for name in feat_names: 417 for item in config: 418 if name.startswith(item["name"]): 419 values_weights += [item["value_weight"]] 420 values_weights = np.array(values_weights) 421 logger.info(f"{values_weights.shape=}") 422 423 if args.plots: 424 from scipy.sparse import csr_matrix 425 426 values_weights_diag = csr_matrix(np.diag(values_weights[idxs])) 427 scaled_weighted = scaled[:, idxs].dot(values_weights_diag) 428 429 aplot( 430 scaled_weighted, 431 "scaled_weighted", 432 feat_names[idxs], 433 Path(args.output_raster).parent, 434 show=False, 435 ) 436 437 # capacities 438 # "name": item["filename"].name.replace('.','_'), 439 cap_cfg = [ 440 { 441 "idx": i, 442 "name": item["filename"].stem, 443 "cap": observations[:, i].sum() * item["capacity_ratio"], 444 "sense": item["capacity_sense"], 445 } 446 for i, item in enumerate(config) 447 if "capacity_ratio" in item 448 ] 449 cap_data = observations[:, [itm["idx"] for itm in cap_cfg]] 450 instance = { 451 "scaled": scaled, 452 "values_weights": values_weights, 453 "cap_cfg": cap_cfg, 454 "cap_data": cap_data, 455 "feat_names": feat_names, 456 "height": height, 457 "width": width, 458 "nodata_mask": nodata_mask, 459 "all_observations": all_observations, 460 "observations": observations, 461 "pipe": pipe, 462 "nodatas": nodatas, 463 } 464 return instance, args
467def main(argv=None): 468 """This main is split in 3 parts with the objective of being called from within QGIS fire2a's toolbox-plugin. 469 Nevertheless, it can be called from the command line: 470 ```bash 471 python -m fire2a.knapsack [config.toml] 472 ``` 473 Or integrated into other scripts. 474 from fire2a.knapasack import main 475 ```python 476 soln, m, instance, args = main(["config.toml"]) 477 ``` 478 """ 479 # 0..9 PRE 480 instance, args = pre_solve(argv) 481 482 # 9. PYOMO MODEL 483 m = get_model(**instance) 484 485 # 10. PYOMO SOLVE 486 opt, results = solve_pyomo(m, tee=True, solver="cplex") 487 instance["opt"] = opt 488 instance["results"] = results 489 490 # 11. POST 491 return post_solve(m, args=args, **instance)
This main is split in 3 parts with the objective of being called from within QGIS fire2a's toolbox-plugin. Nevertheless, it can be called from the command line:
python -m fire2a.knapsack [config.toml]
Or integrated into other scripts. from fire2a.knapasack import main
soln, m, instance, args = main(["config.toml"])
494def post_solve( 495 m, 496 args=None, 497 scaled=None, 498 values_weights=None, 499 cap_cfg=None, 500 cap_data=None, 501 feat_names=None, 502 height=None, 503 width=None, 504 nodata_mask=None, 505 all_observations=None, 506 observations=None, 507 pipe=None, 508 nodatas=None, 509 **kwargs, 510): 511 soln = np.array([pyo.value(m.X[i], exception=False) for i in m.X], dtype=np.float32) 512 logger.info("solution pseudo-histogram: %s", np.unique(soln, return_counts=True)) 513 soln[~soln.astype(bool)] = 0 514 515 try: 516 slacks = m.capacity[:].slack() 517 logger.info("objective: %s", m.obj(exception=False)) 518 except Exception as e: 519 logger.error(e) 520 slacks = [0] * len(cap_cfg) 521 522 if not isinstance(scaled, np.ndarray): 523 scaled = scaled.toarray() 524 vx = np.matmul(scaled.T, soln) 525 526 logger.info("Values per objective:") 527 for f, v in zip(feat_names, vx): 528 logger.info(f"{f}\t\t{v:.4f}") 529 530 if args.plots: 531 fig, ax = plt.subplots(1, 2, figsize=(10, 5)) 532 fig.suptitle("solution") 533 ax[0].set_title("positive objectives, abs(w*v*x)") 534 ax[0].pie(np.absolute(vx * values_weights), labels=feat_names, autopct="%1.1f%%") 535 536 cap_ratio = [slacks[i] / itm["cap"] for i, itm in enumerate(cap_cfg)] 537 ax[1].set_title("capacity slack ratios") 538 ax[1].bar([itm["name"] for itm in cap_cfg], cap_ratio) 539 540 # if __name__ == "__main__": 541 # plt.show() 542 plt.savefig(Path(args.output_raster).parent / "solution.png") 543 plt.close() 544 545 logger.info("Capacity slack:") 546 for i, itm in enumerate(cap_cfg): 547 logger.info(f"{i}: name:{itm['name']} cap:{itm['cap']} sense:{itm['sense']} slack:{slacks[i]}") 548 549 if args.script: 550 instance = { 551 "scaled": scaled, 552 "values_weights": values_weights, 553 "cap_cfg": cap_cfg, 554 "cap_data": cap_data, 555 "feat_names": feat_names, 556 "height": height, 557 "width": width, 558 "nodata_mask": nodata_mask, 559 "all_observations": all_observations, 560 "observations": observations, 561 "pipe": pipe, 562 "nodatas": nodatas, 563 } 564 return soln, m, instance, args 565 else: 566 # 10. WRITE OUTPUTS 567 from fire2a.raster import write_raster 568 569 if args.output_raster: 570 # put soln into the original shape of all_observations using the reversal of nodata_mask 571 data = np.zeros(height * width, dtype=np.float32) - 9999 572 data[~nodata_mask] = soln 573 data = data.reshape(height, width) 574 if not write_raster( 575 data, 576 outfile=str(args.output_raster), 577 nodata=-9999, 578 authid=args.authid, 579 geotransform=args.geotransform, 580 logger=logger, 581 ): 582 logger.error("Error writing output raster") 583 return 1 584 return 0