fire2a.knapsack

👋🌎 🌲🔥

MultiObjective Knapsack Rasters

Select the best set of pixels maximizing the sum of several rescaled and weighted rasters, minding capacity constraints.

Usage

Overview

  1. Choose your raster files (absolute path or relative to the script execution directory)
  2. Configure, for values: rescaling strategies (minmax, onehot, standard, robust or pass) and absolute weights (any real number)
  3. Configure, for capacites: capacity sense (lower or upper bound) and ratio (between -1 and 1)
  4. Set output options (raster, authid, geotransform, plots, ...)

    Command line execution

    # get interface help
    python -m fire2a.knapsack --help
    
    # run (in a qgis/gdal aware python environment)
    python -m fire2a.knapsack [config.toml]
    

Script integration

from fire2a.knapasack import main
solution, model, instance, args = main(["--script","config.toml"])

Preparation

1. Choose your raster files

  • Any GDAL compatible raster will be read
  • Mind that any nodata value will exclude that pixel from the optimization (this can be overriden but not recommended, see --exclude_nodata, specially for weight constraining rasters)
  • A good practice is to place them all in the same directory where the script will be executed
  • "Quote them" if they have any non alphanumerical chars [a-zA-Z0-9]

2. Preprocessing configuration

See the config.toml file for example of the configuration of the preprocessing steps. The file is structured as follows:

["a_filename.tif"]
value_rescaling = "onehot"
value_weight = 0.5

["b_filename.tif"]
capacity_sense = "<="
capacity_ratio = 0.1

This example states the raster filename.tif values will be rescaled using the OneHot strategy, then multiplied by 0.5 in the sought objective; Also that at leat 10% of its weighted pixels must be selected.

  1. __value_rescaling__

    • can be minmax, onehot, standard, robust or pass for no rescaling.
    • minmax (default) and onehot scale into [0,1], standard and robust not, mix them adjusting the value_weight
    • MinMax: (x-min)/(max-min)
    • Standard Scaler: (x-mean)/stddev
    • Robust Scaler: same but droping the tails of the distribution
    • OneHot Encoder: __for CATEGORICAL DATA__
  2. __value_weight__

    • can be any real number, although zero does not make sense
    • positive maximizes, negative minimizes
  3. __capacity_sense__

    • can be >=, ≥, ge, geq, lb for lower bound, <=, ≤, le, leq, ub for upper bound
    • default is upper bound
  4. __capacity_ratio__

    • can be any real number, between -1 and 1
    • is proportional to the sum of all values of the pixels in the raster, meaning if all values are the same it represents the proportion of pixels to be selected

3. Other options

  • __output_raster__ (default "")
  • __authid__ (default "EPSG:3857")
  • __geotransform__ (default "(0, 1, 0, 0, 0, -1)")
  • __plots__ (default False) saves 3 .png files to the same output than the raster, showing the data, the scaled data and the weighted scaled data. Great for debugging but Can really slow down the process.
  • __exclude_nodata__ (default "any") if "any" layer is nodata, it's excluded. It can be relaxed by setting "all" (layers must be nodata to be excluded) can cause great problems with the capacity rasters selecting pixels that weight 0.
  • __script__ (default False) only for integrating into other scripts
  • __no_write__ (default False)
  • __verbose__ (default 0)
  1#!/usr/bin/env python3
  2# fmt: off
  3"""👋🌎 🌲🔥
  4# MultiObjective Knapsack Rasters
  5Select the best set of pixels maximizing the sum of several rescaled and weighted rasters, minding capacity constraints.
  6## Usage
  7### Overview
  81. Choose your raster files (absolute path or relative to the script execution directory)
  92. Configure, for values: rescaling strategies (minmax, onehot, standard, robust or pass) and absolute weights (any real number)
 103. Configure, for capacites: capacity sense (lower or upper bound) and ratio (between -1 and 1)
 114. Set output options (raster, authid, geotransform, plots, ...)
 12### Command line execution
 13    ```bash
 14    # get interface help
 15    python -m fire2a.knapsack --help
 16
 17    # run (in a qgis/gdal aware python environment)
 18    python -m fire2a.knapsack [config.toml]
 19    ```
 20### Script integration
 21    ```python
 22    from fire2a.knapasack import main
 23    solution, model, instance, args = main(["--script","config.toml"])
 24    ```
 25### Preparation
 26#### 1. Choose your raster files
 27- Any [GDAL compatible](https://gdal.org/en/latest/drivers/raster/index.html) raster will be read
 28- Mind that any nodata value will exclude that pixel from the optimization (this can be overriden but not recommended, see `--exclude_nodata`, specially for weight constraining rasters)
 29- A good practice is to place them all in the same directory where the script will be executed
 30- "Quote them" if they have any non alphanumerical chars [a-zA-Z0-9]
 31
 32#### 2. Preprocessing configuration
 33See the `config.toml` file for example of the configuration of the preprocessing steps. The file is structured as follows:
 34
 35```toml
 36["a_filename.tif"]
 37value_rescaling = "onehot"
 38value_weight = 0.5
 39
 40["b_filename.tif"]
 41capacity_sense = "<="
 42capacity_ratio = 0.1
 43```
 44This example states the raster `filename.tif` values will be rescaled using the `OneHot` strategy, then multiplied by 0.5 in the sought objective; Also that at leat 10% of its weighted pixels must be selected. 
 45
 461. __value_rescaling__
 47   - can be minmax, onehot, standard, robust or pass for no rescaling.
 48   - minmax (default) and onehot scale into [0,1], standard and robust not, mix them adjusting the value_weight
 49   - [MinMax](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html): (x-min)/(max-min)
 50   - [Standard Scaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html): (x-mean)/stddev
 51   - [Robust Scaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html): same but droping the tails of the distribution
 52   - [OneHot Encoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html): __for CATEGORICAL DATA__
 53
 542. __value_weight__
 55   - can be any real number, although zero does not make sense
 56   - positive maximizes, negative minimizes
 57
 583. __capacity_sense__
 59    - can be >=, ≥, ge, geq, lb for lower bound, <=, ≤, le, leq, ub for upper bound
 60    - default is upper bound
 61
 624. __capacity_ratio__
 63   - can be any real number, between -1 and 1
 64   - is proportional to the sum of all values of the pixels in the raster, meaning if all values are the same it represents the proportion of pixels to be selected
 65
 66#### 3. Other options
 67- __output_raster__ (default "")
 68- __authid__ (default "EPSG:3857")
 69- __geotransform__ (default "(0, 1, 0, 0, 0, -1)")
 70- __plots__ (default False) saves 3 .png files to the same output than the raster, showing the data, the scaled data and the weighted scaled data. Great for debugging but Can really slow down the process.
 71- __exclude_nodata__ (default "any") if "any" layer is nodata, it's excluded. It can be relaxed by setting "all" (layers must be nodata to be excluded) can cause great problems with the capacity rasters selecting pixels that weight 0.
 72- __script__ (default False) only for integrating into other scripts
 73- __no_write__ (default False)
 74- __verbose__ (default 0)
 75
 76"""
 77# fmt: on
 78import logging
 79import sys
 80from pathlib import Path
 81
 82import numpy as np
 83from matplotlib import pyplot as plt
 84from pyomo import environ as pyo
 85
 86from fire2a.utils import read_toml
 87
 88logger = logging.getLogger(__name__)
 89
 90allowed_ub = ["<=", "≤", "le", "leq", "ub"]
 91allowed_lb = [">=", "≥", "ge", "geq", "lb"]
 92config_allowed = {
 93    "value_rescaling": ["minmax", "standard", "robust", "onehot", "pass"],
 94    "capacity_sense": allowed_ub + allowed_lb,
 95}
 96
 97
 98def check_shapes(data_list):
 99    """Check if all data arrays have the same shape and are 2D.
100    Returns the shape of the data arrays if they are all equal.
101    """
102    from functools import reduce
103
104    def equal_or_error(x, y):
105        """Check if x and y are equal, returns x if equal else raises a ValueError."""
106        if x == y:
107            return x
108        else:
109            raise ValueError("All data arrays must have the same shape")
110
111    shape = reduce(equal_or_error, (data.shape for data in data_list))
112    if len(shape) != 2:
113        raise ValueError("All data arrays must be 2D")
114    height, width = shape
115    return height, width
116
117
118def pipelie(observations, config):
119    """Create a pipeline for the observations and the configuration."""
120    from sklearn.compose import ColumnTransformer
121    from sklearn.pipeline import Pipeline
122    from sklearn.preprocessing import MinMaxScaler, OneHotEncoder, RobustScaler, StandardScaler
123
124    # 1. SCALING
125    scalers = {
126        "minmax": MinMaxScaler(),
127        "standard": StandardScaler(),
128        "robust": RobustScaler(),
129        "onehot": OneHotEncoder(),
130        "passthrough": "passthrough",
131    }
132
133    # 2. PIPELINE
134    pipe = Pipeline(
135        [
136            (
137                "scaler",
138                ColumnTransformer(
139                    [
140                        (item["name"], scalers.get(item.get("value_rescaling")), [i])
141                        for i, item in enumerate(config)
142                        if item.get("value_rescaling")
143                    ],
144                    remainder="drop",
145                ),
146            )
147        ],
148        verbose=True,
149    )
150
151    # 3. FIT
152    scaled = pipe.fit_transform(observations)
153    feat_names = pipe.named_steps["scaler"].get_feature_names_out()
154    logger.debug("Pipeline input: %s", [{itm.get("name"): itm.get("value_rescaling")} for itm in config])
155    logger.debug("Pipeline output: feat_names:%s", feat_names)
156    logger.debug("Pipeline output: scaled.shape:%s", scaled.shape)
157    return scaled, pipe, feat_names
158
159
160def arg_parser(argv=None):
161    """Parse command line arguments."""
162    from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
163
164    parser = ArgumentParser(
165        description="MultiObjective Knapsack Rasters",
166        formatter_class=ArgumentDefaultsHelpFormatter,
167        epilog="Full documentation at https://fire2a.github.io/fire2a-lib/fire2a/knapsack.html",
168    )
169    parser.add_argument(
170        "config_file",
171        nargs="?",
172        type=Path,
173        help="A toml formatted file, with a section header for each raster [file_name], with items: 'value_rescaling', 'value_weight', 'capacity_sense' and 'capacity_ratio'",
174        default="config.toml",
175    )
176    parser.add_argument("-or", "--output_raster", help="Output raster file, warning overwrites!", default="")
177    parser.add_argument(
178        "-e",
179        "--exclude_nodata",
180        type=str,
181        help="By default if 'any' layer is nodata, it's excluded. It can be relaxed by setting 'all' (layers must be nodata to be excluded)",
182        choices=["any", "all"],
183        default="any",
184    )
185    parser.add_argument("-a", "--authid", type=str, help="Output raster authid", default="EPSG:3857")
186    parser.add_argument(
187        "-g", "--geotransform", type=str, help="Output raster geotransform", default="(0, 1, 0, 0, 0, -1)"
188    )
189    parser.add_argument(
190        "-nw",
191        "--no_write",
192        action="store_true",
193        help="Do not write outputs raster nor polygons",
194        default=False,
195    )
196    parser.add_argument(
197        "-s",
198        "--script",
199        action="store_true",
200        help="Run in script mode, returning the label_map and the pipeline object",
201        default=False,
202    )
203    parser.add_argument("--verbose", "-v", action="count", default=0, help="WARNING:1, INFO:2, DEBUG:3")
204    parser.add_argument(
205        "-p",
206        "--plots",
207        action="store_true",
208        help="Activate the plotting routines (saves 3 .png files to the same output than the raster)",
209        default=False,
210    )
211    args = parser.parse_args(argv)
212    args.geotransform = tuple(map(float, args.geotransform[1:-1].split(",")))
213    if Path(args.config_file).is_file() is False:
214        parser.error(f"File {args.config_file} not found")
215    return args
216
217
218def aplot(data: np.ndarray, title: str, series_names: list[str], outpath: Path, show=False):  # __name__ == "__main__"
219    """
220    names = [itm["name"] for i, itm in enumerate(config)]
221    outpath = Path(args.output_raster).parent
222    fname =  outpath / "observations.png"
223    """
224    if not isinstance(data, np.ndarray):
225        data = data.toarray()
226    if not isinstance(data[0], np.ndarray):
227        for itm in data:
228            itm = itm.toarray()
229    fig, ax = plt.subplots(1, 2, figsize=(10, 5))
230    fig.suptitle(title)
231    ax[0].violinplot(data, showmeans=False, showmedians=True, showextrema=True)
232    ax[0].set_title("violinplot")
233    ax[0].set_xticks(range(1, len(series_names) + 1), series_names)
234    ax[1].boxplot(data)
235    ax[1].set_title("boxplot")
236    ax[1].set_xticks(range(1, len(series_names) + 1), series_names)
237    if show:
238        plt.show()
239    fname = outpath / (title + ".png")
240    plt.savefig(fname)
241    plt.close()
242
243
244def get_model(scaled=None, values_weights=None, cap_cfg=None, cap_data=None, **kwargs):
245
246    # m model
247    m = pyo.ConcreteModel("MultiObjectiveKnapsack")
248    # sets
249    m.V = pyo.RangeSet(0, scaled.shape[1] - 1)
250    scaled_n, scaled_v = scaled.nonzero()
251    m.NV = pyo.Set(initialize=[(n, v) for n, v in zip(scaled_n, scaled_v)])
252    m.W = pyo.RangeSet(0, len(cap_cfg) - 1)
253    cap_data_n, cap_data_v = cap_data.nonzero()
254    m.NW = pyo.Set(initialize=[(n, w) for n, w in zip(cap_data_n, cap_data_v)])
255    both_nv_nw = list(set(scaled_n) | set(cap_data_n))
256    both_nv_nw.sort()
257    m.N = pyo.Set(initialize=both_nv_nw)
258    # parameters
259    m.scaled_values = pyo.Param(m.NV, initialize=lambda m, n, v: scaled[n, v])
260    m.values_weight = pyo.Param(m.V, initialize=values_weights)
261    m.total_capacity = pyo.Param(m.W, initialize=[itm["cap"] for itm in cap_cfg])
262    m.capacity_weight = pyo.Param(m.NW, initialize=lambda m, n, w: cap_data[n, w])
263    # variables
264    m.X = pyo.Var(m.N, within=pyo.Binary)
265
266    # constraints
267    def capacity_rule(m, w):
268        if cap_cfg[w]["sense"] in allowed_ub:
269            return sum(m.X[n] * m.capacity_weight[n, w] for n, ww in m.NW if ww == w) <= m.total_capacity[w]
270        elif cap_cfg[w]["sense"] in allowed_lb:
271            return sum(m.X[n] * m.capacity_weight[n, w] for n, ww in m.NW if ww == w) >= m.total_capacity[w]
272        else:
273            logger.critical("Skipping capacity constraint %s, %s", w, cap_cfg[w])
274            return pyo.Constraint.Skip
275
276    m.capacity = pyo.Constraint(m.W, rule=capacity_rule)
277    # objective
278    m.obj = pyo.Objective(
279        expr=sum(m.values_weight[v] * sum(m.X[n] * m.scaled_values[n, v] for n, vv in m.NV if vv == v) for v in m.V),
280        sense=pyo.maximize,
281    )
282    return m
283
284
285def solve_pyomo(m, tee=True, solver="cplex", **kwargs):
286    from pyomo.opt import SolverFactory
287
288    opt = SolverFactory(solver)
289    results = opt.solve(m, tee=tee, **kwargs)
290    return opt, results
291
292
293def pre_solve(argv):
294    if argv is sys.argv:
295        argv = sys.argv[1:]
296    args = arg_parser(argv)
297
298    if args.verbose != 0:
299        global logger
300        from fire2a import setup_logger
301
302        logger = setup_logger(verbosity=args.verbose)
303
304    logger.info("args %s", args)
305
306    # 2 LEE CONFIG
307    config = read_toml(args.config_file)
308    # dict -> list[dict]
309    a, b = list(config.keys()), list(config.values())
310    config = [{"name": Path(a).name, "filename": Path(a), **b} for a, b in zip(a, b)]
311    for itm in config:
312        if "value_rescaling" in itm:
313            itm["value_rescaling"] = itm["value_rescaling"].lower()
314            if itm["value_rescaling"] not in config_allowed["value_rescaling"]:
315                logger.critical("Wrong value for value_rescaling in %s", itm)
316                sys.exit(1)
317        if "capacity_sense" in itm:
318            itm["capacity_sense"] = itm["capacity_sense"].lower()
319            if itm["capacity_sense"] not in config_allowed["capacity_sense"]:
320                logger.critical("Wrong value for capacity_sense in %s", itm)
321                sys.exit(1)
322        logger.debug(itm)
323
324    # 2.1 CHECK PAIRS + defaults
325    # vr : value_rescaling
326    # vw : value_weight
327    # cr : capacity_ratio
328    # cs : capacity_sense
329    for itm in config:
330        if vr := itm.get("value_rescaling"):
331            # vr & !vw => vw = 1
332            if "value_weight" not in itm:
333                logger.warning(
334                    "value_rescaling without value_weight for item: %s\n ASSUMING VALUE WEIGHT IS 1", itm["name"]
335                )
336                itm["value_weight"] = 1
337            if vr == "pass":
338                itm["value_weight"] = "passthrough"
339        # !vr & vw => vr = passthrough
340        elif "value_weight" in itm:
341            logger.warning("value_weight without value_rescaling for item: %s\n DEFAULTING TO MINMAX", itm["name"])
342            itm["value_rescaling"] = "minmax"
343        if cr := itm.get("capacity_ratio"):
344            # cr not in (-1,1) =>!<=
345            if not (-1 < cr < 1):
346                logger.critical("Wrong value for capacity_ratio in %s, should be (-1,1)", itm)
347                sys.exit(1)
348            # cr & !cs => cs = ub
349            if "capacity_sense" not in itm:
350                logger.warning(
351                    "capacity_ratio without capacity_sense for item: %s\n ASSUMING SENSE IS UPPER BOUND", itm["name"]
352                )
353                itm["capacity_sense"] = "ub"
354        # !cr & cs =>!<=
355        elif "capacity_sense" in itm:
356            logger.critical("capacity_sense without capacity_ratio for item: %s", itm["name"])
357            sys.exit(1)
358
359    # 3. LEE DATA
360    from fire2a.raster import read_raster
361
362    data_list = []
363    for item in config:
364        data, info = read_raster(str(item["filename"]))
365        item.update(info)
366        data_list += [data]
367
368    # 4. VALIDAR 2d todos mismo shape
369    height, width = check_shapes(data_list)
370
371    # 5. lista[mapas] -> OBSERVACIONES
372    all_observations = np.column_stack([data.ravel() for data in data_list])
373
374    # 6. if all|any rasters are nodata then mask out
375    nodatas = [item["NoDataValue"] for item in config]
376    if args.exclude_nodata == "any":
377        nodata_mask = np.any(all_observations == nodatas, axis=1)
378    if args.exclude_nodata == "all":
379        nodata_mask = np.all(all_observations == nodatas, axis=1)
380
381    logger.info("Sum of all rasters NoData: %s pixels", nodata_mask.sum())
382    observations = all_observations[~nodata_mask]
383
384    # 7. nodata -> 0
385    if args.exclude_nodata == "all":
386        for col, nd in zip(observations.T, nodatas):
387            col[col == nd] = 0
388
389    if args.plots:
390        aplot(
391            observations, "observations", [itm["name"] for itm in config], Path(args.output_raster).parent, show=False
392        )
393
394    # scaling
395    # 8. PIPELINE
396    scaled, pipe, feat_names = pipelie(observations, config)
397    # assert observations.shape[0] == scaled.shape[0]
398    # assert observations.shape[1] >= scaled.shape[1]
399    logger.info(f"{observations.shape=}")
400    logger.info(f"{scaled.shape=}")
401
402    if args.plots:
403        # exclude multiple ocurrences of onehots
404        first_occurrences = {}
405        for i, fn in enumerate(feat_names):
406            for bn in [itm["name"] for itm in config]:
407                if fn.startswith(bn) and bn not in first_occurrences:
408                    first_occurrences[bn] = i
409                    break
410        idxs = list(first_occurrences.values())
411        aplot(scaled[:, idxs], "scaled", feat_names[idxs], Path(args.output_raster).parent, show=False)
412
413    # weights
414    values_weights = []
415    for name in feat_names:
416        for item in config:
417            if name.startswith(item["name"]):
418                values_weights += [item["value_weight"]]
419    values_weights = np.array(values_weights)
420    logger.info(f"{values_weights.shape=}")
421
422    if args.plots:
423        from scipy.sparse import csr_matrix
424
425        values_weights_diag = csr_matrix(np.diag(values_weights[idxs]))
426        scaled_weighted = scaled[:, idxs].dot(values_weights_diag)
427
428        aplot(
429            scaled_weighted,
430            "scaled_weighted",
431            feat_names[idxs],
432            Path(args.output_raster).parent,
433            show=False,
434        )
435
436    # capacities
437    # "name": item["filename"].name.replace('.','_'),
438    cap_cfg = [
439        {
440            "idx": i,
441            "name": item["filename"].stem,
442            "cap": observations[:, i].sum() * item["capacity_ratio"],
443            "sense": item["capacity_sense"],
444        }
445        for i, item in enumerate(config)
446        if "capacity_ratio" in item
447    ]
448    cap_data = observations[:, [itm["idx"] for itm in cap_cfg]]
449    instance = {
450        "scaled": scaled,
451        "values_weights": values_weights,
452        "cap_cfg": cap_cfg,
453        "cap_data": cap_data,
454        "feat_names": feat_names,
455        "height": height,
456        "width": width,
457        "nodata_mask": nodata_mask,
458        "all_observations": all_observations,
459        "observations": observations,
460        "pipe": pipe,
461        "nodatas": nodatas,
462    }
463    return instance, args
464
465
466def main(argv=None):
467    """This main is split in 3 parts with the objective of being called from within QGIS fire2a's toolbox-plugin.
468    Nevertheless, it can be called from the command line:
469    ```bash
470    python -m fire2a.knapsack [config.toml]
471    ```
472    Or integrated into other scripts.
473    from fire2a.knapasack import main
474    ```python
475    soln, m, instance, args = main(["config.toml"])
476    ```
477    """
478    # 0..9 PRE
479    instance, args = pre_solve(argv)
480
481    # 9. PYOMO MODEL
482    m = get_model(**instance)
483
484    # 10. PYOMO SOLVE
485    opt, results = solve_pyomo(m, tee=True, solver="cplex")
486    instance["opt"] = opt
487    instance["results"] = results
488
489    # 11. POST
490    return post_solve(m, args=args, **instance)
491
492
493def post_solve(
494    m,
495    args=None,
496    scaled=None,
497    values_weights=None,
498    cap_cfg=None,
499    cap_data=None,
500    feat_names=None,
501    height=None,
502    width=None,
503    nodata_mask=None,
504    all_observations=None,
505    observations=None,
506    pipe=None,
507    nodatas=None,
508    **kwargs,
509):
510    soln = np.array([pyo.value(m.X[i], exception=False) for i in m.X], dtype=np.float32)
511    logger.info("solution pseudo-histogram: %s", np.unique(soln, return_counts=True))
512    soln[~soln.astype(bool)] = 0
513
514    try:
515        slacks = m.capacity[:].slack()
516        logger.info("objective: %s", m.obj(exception=False))
517    except Exception as e:
518        logger.error(e)
519        slacks = [0] * len(cap_cfg)
520
521    if not isinstance(scaled, np.ndarray):
522        scaled = scaled.toarray()
523    vx = np.matmul(scaled.T, soln)
524
525    logger.info("Values per objective:")
526    for f, v in zip(feat_names, vx):
527        logger.info(f"{f}\t\t{v:.4f}")
528
529    if args.plots:
530        fig, ax = plt.subplots(1, 2, figsize=(10, 5))
531        fig.suptitle("solution")
532        ax[0].set_title("positive objectives, abs(w*v*x)")
533        ax[0].pie(np.absolute(vx * values_weights), labels=feat_names, autopct="%1.1f%%")
534
535        cap_ratio = [slacks[i] / itm["cap"] for i, itm in enumerate(cap_cfg)]
536        ax[1].set_title("capacity slack ratios")
537        ax[1].bar([itm["name"] for itm in cap_cfg], cap_ratio)
538
539        # if __name__ == "__main__":
540        #      plt.show()
541        plt.savefig(Path(args.output_raster).parent / "solution.png")
542        plt.close()
543
544    logger.info("Capacity slack:")
545    for i, itm in enumerate(cap_cfg):
546        logger.info(f"{i}: name:{itm['name']} cap:{itm['cap']} sense:{itm['sense']} slack:{slacks[i]}")
547
548    if args.script:
549        instance = {
550            "scaled": scaled,
551            "values_weights": values_weights,
552            "cap_cfg": cap_cfg,
553            "cap_data": cap_data,
554            "feat_names": feat_names,
555            "height": height,
556            "width": width,
557            "nodata_mask": nodata_mask,
558            "all_observations": all_observations,
559            "observations": observations,
560            "pipe": pipe,
561            "nodatas": nodatas,
562        }
563        return soln, m, instance, args
564    else:
565        # 10. WRITE OUTPUTS
566        from fire2a.raster import write_raster
567
568        if args.output_raster:
569            # put soln into the original shape of all_observations using the reversal of nodata_mask
570            data = np.zeros(height * width, dtype=np.float32) - 9999
571            data[~nodata_mask] = soln
572            data = data.reshape(height, width)
573            if not write_raster(
574                data,
575                outfile=str(args.output_raster),
576                nodata=-9999,
577                authid=args.authid,
578                geotransform=args.geotransform,
579                logger=logger,
580            ):
581                logger.error("Error writing output raster")
582                return 1
583    return 0
584
585
586if __name__ == "__main__":
587    sys.exit(main(sys.argv))
logger = <Logger fire2a.knapsack (WARNING)>
allowed_ub = ['<=', '≤', 'le', 'leq', 'ub']
allowed_lb = ['>=', '≥', 'ge', 'geq', 'lb']
config_allowed = {'value_rescaling': ['minmax', 'standard', 'robust', 'onehot', 'pass'], 'capacity_sense': ['<=', '≤', 'le', 'leq', 'ub', '>=', '≥', 'ge', 'geq', 'lb']}
def check_shapes(data_list):
 99def check_shapes(data_list):
100    """Check if all data arrays have the same shape and are 2D.
101    Returns the shape of the data arrays if they are all equal.
102    """
103    from functools import reduce
104
105    def equal_or_error(x, y):
106        """Check if x and y are equal, returns x if equal else raises a ValueError."""
107        if x == y:
108            return x
109        else:
110            raise ValueError("All data arrays must have the same shape")
111
112    shape = reduce(equal_or_error, (data.shape for data in data_list))
113    if len(shape) != 2:
114        raise ValueError("All data arrays must be 2D")
115    height, width = shape
116    return height, width

Check if all data arrays have the same shape and are 2D. Returns the shape of the data arrays if they are all equal.

def pipelie(observations, config):
119def pipelie(observations, config):
120    """Create a pipeline for the observations and the configuration."""
121    from sklearn.compose import ColumnTransformer
122    from sklearn.pipeline import Pipeline
123    from sklearn.preprocessing import MinMaxScaler, OneHotEncoder, RobustScaler, StandardScaler
124
125    # 1. SCALING
126    scalers = {
127        "minmax": MinMaxScaler(),
128        "standard": StandardScaler(),
129        "robust": RobustScaler(),
130        "onehot": OneHotEncoder(),
131        "passthrough": "passthrough",
132    }
133
134    # 2. PIPELINE
135    pipe = Pipeline(
136        [
137            (
138                "scaler",
139                ColumnTransformer(
140                    [
141                        (item["name"], scalers.get(item.get("value_rescaling")), [i])
142                        for i, item in enumerate(config)
143                        if item.get("value_rescaling")
144                    ],
145                    remainder="drop",
146                ),
147            )
148        ],
149        verbose=True,
150    )
151
152    # 3. FIT
153    scaled = pipe.fit_transform(observations)
154    feat_names = pipe.named_steps["scaler"].get_feature_names_out()
155    logger.debug("Pipeline input: %s", [{itm.get("name"): itm.get("value_rescaling")} for itm in config])
156    logger.debug("Pipeline output: feat_names:%s", feat_names)
157    logger.debug("Pipeline output: scaled.shape:%s", scaled.shape)
158    return scaled, pipe, feat_names

Create a pipeline for the observations and the configuration.

def arg_parser(argv=None):
161def arg_parser(argv=None):
162    """Parse command line arguments."""
163    from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
164
165    parser = ArgumentParser(
166        description="MultiObjective Knapsack Rasters",
167        formatter_class=ArgumentDefaultsHelpFormatter,
168        epilog="Full documentation at https://fire2a.github.io/fire2a-lib/fire2a/knapsack.html",
169    )
170    parser.add_argument(
171        "config_file",
172        nargs="?",
173        type=Path,
174        help="A toml formatted file, with a section header for each raster [file_name], with items: 'value_rescaling', 'value_weight', 'capacity_sense' and 'capacity_ratio'",
175        default="config.toml",
176    )
177    parser.add_argument("-or", "--output_raster", help="Output raster file, warning overwrites!", default="")
178    parser.add_argument(
179        "-e",
180        "--exclude_nodata",
181        type=str,
182        help="By default if 'any' layer is nodata, it's excluded. It can be relaxed by setting 'all' (layers must be nodata to be excluded)",
183        choices=["any", "all"],
184        default="any",
185    )
186    parser.add_argument("-a", "--authid", type=str, help="Output raster authid", default="EPSG:3857")
187    parser.add_argument(
188        "-g", "--geotransform", type=str, help="Output raster geotransform", default="(0, 1, 0, 0, 0, -1)"
189    )
190    parser.add_argument(
191        "-nw",
192        "--no_write",
193        action="store_true",
194        help="Do not write outputs raster nor polygons",
195        default=False,
196    )
197    parser.add_argument(
198        "-s",
199        "--script",
200        action="store_true",
201        help="Run in script mode, returning the label_map and the pipeline object",
202        default=False,
203    )
204    parser.add_argument("--verbose", "-v", action="count", default=0, help="WARNING:1, INFO:2, DEBUG:3")
205    parser.add_argument(
206        "-p",
207        "--plots",
208        action="store_true",
209        help="Activate the plotting routines (saves 3 .png files to the same output than the raster)",
210        default=False,
211    )
212    args = parser.parse_args(argv)
213    args.geotransform = tuple(map(float, args.geotransform[1:-1].split(",")))
214    if Path(args.config_file).is_file() is False:
215        parser.error(f"File {args.config_file} not found")
216    return args

Parse command line arguments.

def aplot( data: numpy.ndarray, title: str, series_names: list[str], outpath: pathlib.Path, show=False):
219def aplot(data: np.ndarray, title: str, series_names: list[str], outpath: Path, show=False):  # __name__ == "__main__"
220    """
221    names = [itm["name"] for i, itm in enumerate(config)]
222    outpath = Path(args.output_raster).parent
223    fname =  outpath / "observations.png"
224    """
225    if not isinstance(data, np.ndarray):
226        data = data.toarray()
227    if not isinstance(data[0], np.ndarray):
228        for itm in data:
229            itm = itm.toarray()
230    fig, ax = plt.subplots(1, 2, figsize=(10, 5))
231    fig.suptitle(title)
232    ax[0].violinplot(data, showmeans=False, showmedians=True, showextrema=True)
233    ax[0].set_title("violinplot")
234    ax[0].set_xticks(range(1, len(series_names) + 1), series_names)
235    ax[1].boxplot(data)
236    ax[1].set_title("boxplot")
237    ax[1].set_xticks(range(1, len(series_names) + 1), series_names)
238    if show:
239        plt.show()
240    fname = outpath / (title + ".png")
241    plt.savefig(fname)
242    plt.close()

names = [itm["name"] for i, itm in enumerate(config)] outpath = Path(args.output_raster).parent fname = outpath / "observations.png"

def get_model( scaled=None, values_weights=None, cap_cfg=None, cap_data=None, **kwargs):
245def get_model(scaled=None, values_weights=None, cap_cfg=None, cap_data=None, **kwargs):
246
247    # m model
248    m = pyo.ConcreteModel("MultiObjectiveKnapsack")
249    # sets
250    m.V = pyo.RangeSet(0, scaled.shape[1] - 1)
251    scaled_n, scaled_v = scaled.nonzero()
252    m.NV = pyo.Set(initialize=[(n, v) for n, v in zip(scaled_n, scaled_v)])
253    m.W = pyo.RangeSet(0, len(cap_cfg) - 1)
254    cap_data_n, cap_data_v = cap_data.nonzero()
255    m.NW = pyo.Set(initialize=[(n, w) for n, w in zip(cap_data_n, cap_data_v)])
256    both_nv_nw = list(set(scaled_n) | set(cap_data_n))
257    both_nv_nw.sort()
258    m.N = pyo.Set(initialize=both_nv_nw)
259    # parameters
260    m.scaled_values = pyo.Param(m.NV, initialize=lambda m, n, v: scaled[n, v])
261    m.values_weight = pyo.Param(m.V, initialize=values_weights)
262    m.total_capacity = pyo.Param(m.W, initialize=[itm["cap"] for itm in cap_cfg])
263    m.capacity_weight = pyo.Param(m.NW, initialize=lambda m, n, w: cap_data[n, w])
264    # variables
265    m.X = pyo.Var(m.N, within=pyo.Binary)
266
267    # constraints
268    def capacity_rule(m, w):
269        if cap_cfg[w]["sense"] in allowed_ub:
270            return sum(m.X[n] * m.capacity_weight[n, w] for n, ww in m.NW if ww == w) <= m.total_capacity[w]
271        elif cap_cfg[w]["sense"] in allowed_lb:
272            return sum(m.X[n] * m.capacity_weight[n, w] for n, ww in m.NW if ww == w) >= m.total_capacity[w]
273        else:
274            logger.critical("Skipping capacity constraint %s, %s", w, cap_cfg[w])
275            return pyo.Constraint.Skip
276
277    m.capacity = pyo.Constraint(m.W, rule=capacity_rule)
278    # objective
279    m.obj = pyo.Objective(
280        expr=sum(m.values_weight[v] * sum(m.X[n] * m.scaled_values[n, v] for n, vv in m.NV if vv == v) for v in m.V),
281        sense=pyo.maximize,
282    )
283    return m
def solve_pyomo(m, tee=True, solver='cplex', **kwargs):
286def solve_pyomo(m, tee=True, solver="cplex", **kwargs):
287    from pyomo.opt import SolverFactory
288
289    opt = SolverFactory(solver)
290    results = opt.solve(m, tee=tee, **kwargs)
291    return opt, results
def pre_solve(argv):
294def pre_solve(argv):
295    if argv is sys.argv:
296        argv = sys.argv[1:]
297    args = arg_parser(argv)
298
299    if args.verbose != 0:
300        global logger
301        from fire2a import setup_logger
302
303        logger = setup_logger(verbosity=args.verbose)
304
305    logger.info("args %s", args)
306
307    # 2 LEE CONFIG
308    config = read_toml(args.config_file)
309    # dict -> list[dict]
310    a, b = list(config.keys()), list(config.values())
311    config = [{"name": Path(a).name, "filename": Path(a), **b} for a, b in zip(a, b)]
312    for itm in config:
313        if "value_rescaling" in itm:
314            itm["value_rescaling"] = itm["value_rescaling"].lower()
315            if itm["value_rescaling"] not in config_allowed["value_rescaling"]:
316                logger.critical("Wrong value for value_rescaling in %s", itm)
317                sys.exit(1)
318        if "capacity_sense" in itm:
319            itm["capacity_sense"] = itm["capacity_sense"].lower()
320            if itm["capacity_sense"] not in config_allowed["capacity_sense"]:
321                logger.critical("Wrong value for capacity_sense in %s", itm)
322                sys.exit(1)
323        logger.debug(itm)
324
325    # 2.1 CHECK PAIRS + defaults
326    # vr : value_rescaling
327    # vw : value_weight
328    # cr : capacity_ratio
329    # cs : capacity_sense
330    for itm in config:
331        if vr := itm.get("value_rescaling"):
332            # vr & !vw => vw = 1
333            if "value_weight" not in itm:
334                logger.warning(
335                    "value_rescaling without value_weight for item: %s\n ASSUMING VALUE WEIGHT IS 1", itm["name"]
336                )
337                itm["value_weight"] = 1
338            if vr == "pass":
339                itm["value_weight"] = "passthrough"
340        # !vr & vw => vr = passthrough
341        elif "value_weight" in itm:
342            logger.warning("value_weight without value_rescaling for item: %s\n DEFAULTING TO MINMAX", itm["name"])
343            itm["value_rescaling"] = "minmax"
344        if cr := itm.get("capacity_ratio"):
345            # cr not in (-1,1) =>!<=
346            if not (-1 < cr < 1):
347                logger.critical("Wrong value for capacity_ratio in %s, should be (-1,1)", itm)
348                sys.exit(1)
349            # cr & !cs => cs = ub
350            if "capacity_sense" not in itm:
351                logger.warning(
352                    "capacity_ratio without capacity_sense for item: %s\n ASSUMING SENSE IS UPPER BOUND", itm["name"]
353                )
354                itm["capacity_sense"] = "ub"
355        # !cr & cs =>!<=
356        elif "capacity_sense" in itm:
357            logger.critical("capacity_sense without capacity_ratio for item: %s", itm["name"])
358            sys.exit(1)
359
360    # 3. LEE DATA
361    from fire2a.raster import read_raster
362
363    data_list = []
364    for item in config:
365        data, info = read_raster(str(item["filename"]))
366        item.update(info)
367        data_list += [data]
368
369    # 4. VALIDAR 2d todos mismo shape
370    height, width = check_shapes(data_list)
371
372    # 5. lista[mapas] -> OBSERVACIONES
373    all_observations = np.column_stack([data.ravel() for data in data_list])
374
375    # 6. if all|any rasters are nodata then mask out
376    nodatas = [item["NoDataValue"] for item in config]
377    if args.exclude_nodata == "any":
378        nodata_mask = np.any(all_observations == nodatas, axis=1)
379    if args.exclude_nodata == "all":
380        nodata_mask = np.all(all_observations == nodatas, axis=1)
381
382    logger.info("Sum of all rasters NoData: %s pixels", nodata_mask.sum())
383    observations = all_observations[~nodata_mask]
384
385    # 7. nodata -> 0
386    if args.exclude_nodata == "all":
387        for col, nd in zip(observations.T, nodatas):
388            col[col == nd] = 0
389
390    if args.plots:
391        aplot(
392            observations, "observations", [itm["name"] for itm in config], Path(args.output_raster).parent, show=False
393        )
394
395    # scaling
396    # 8. PIPELINE
397    scaled, pipe, feat_names = pipelie(observations, config)
398    # assert observations.shape[0] == scaled.shape[0]
399    # assert observations.shape[1] >= scaled.shape[1]
400    logger.info(f"{observations.shape=}")
401    logger.info(f"{scaled.shape=}")
402
403    if args.plots:
404        # exclude multiple ocurrences of onehots
405        first_occurrences = {}
406        for i, fn in enumerate(feat_names):
407            for bn in [itm["name"] for itm in config]:
408                if fn.startswith(bn) and bn not in first_occurrences:
409                    first_occurrences[bn] = i
410                    break
411        idxs = list(first_occurrences.values())
412        aplot(scaled[:, idxs], "scaled", feat_names[idxs], Path(args.output_raster).parent, show=False)
413
414    # weights
415    values_weights = []
416    for name in feat_names:
417        for item in config:
418            if name.startswith(item["name"]):
419                values_weights += [item["value_weight"]]
420    values_weights = np.array(values_weights)
421    logger.info(f"{values_weights.shape=}")
422
423    if args.plots:
424        from scipy.sparse import csr_matrix
425
426        values_weights_diag = csr_matrix(np.diag(values_weights[idxs]))
427        scaled_weighted = scaled[:, idxs].dot(values_weights_diag)
428
429        aplot(
430            scaled_weighted,
431            "scaled_weighted",
432            feat_names[idxs],
433            Path(args.output_raster).parent,
434            show=False,
435        )
436
437    # capacities
438    # "name": item["filename"].name.replace('.','_'),
439    cap_cfg = [
440        {
441            "idx": i,
442            "name": item["filename"].stem,
443            "cap": observations[:, i].sum() * item["capacity_ratio"],
444            "sense": item["capacity_sense"],
445        }
446        for i, item in enumerate(config)
447        if "capacity_ratio" in item
448    ]
449    cap_data = observations[:, [itm["idx"] for itm in cap_cfg]]
450    instance = {
451        "scaled": scaled,
452        "values_weights": values_weights,
453        "cap_cfg": cap_cfg,
454        "cap_data": cap_data,
455        "feat_names": feat_names,
456        "height": height,
457        "width": width,
458        "nodata_mask": nodata_mask,
459        "all_observations": all_observations,
460        "observations": observations,
461        "pipe": pipe,
462        "nodatas": nodatas,
463    }
464    return instance, args
def main(argv=None):
467def main(argv=None):
468    """This main is split in 3 parts with the objective of being called from within QGIS fire2a's toolbox-plugin.
469    Nevertheless, it can be called from the command line:
470    ```bash
471    python -m fire2a.knapsack [config.toml]
472    ```
473    Or integrated into other scripts.
474    from fire2a.knapasack import main
475    ```python
476    soln, m, instance, args = main(["config.toml"])
477    ```
478    """
479    # 0..9 PRE
480    instance, args = pre_solve(argv)
481
482    # 9. PYOMO MODEL
483    m = get_model(**instance)
484
485    # 10. PYOMO SOLVE
486    opt, results = solve_pyomo(m, tee=True, solver="cplex")
487    instance["opt"] = opt
488    instance["results"] = results
489
490    # 11. POST
491    return post_solve(m, args=args, **instance)

This main is split in 3 parts with the objective of being called from within QGIS fire2a's toolbox-plugin. Nevertheless, it can be called from the command line:

python -m fire2a.knapsack [config.toml]

Or integrated into other scripts. from fire2a.knapasack import main

soln, m, instance, args = main(["config.toml"])
def post_solve( m, args=None, scaled=None, values_weights=None, cap_cfg=None, cap_data=None, feat_names=None, height=None, width=None, nodata_mask=None, all_observations=None, observations=None, pipe=None, nodatas=None, **kwargs):
494def post_solve(
495    m,
496    args=None,
497    scaled=None,
498    values_weights=None,
499    cap_cfg=None,
500    cap_data=None,
501    feat_names=None,
502    height=None,
503    width=None,
504    nodata_mask=None,
505    all_observations=None,
506    observations=None,
507    pipe=None,
508    nodatas=None,
509    **kwargs,
510):
511    soln = np.array([pyo.value(m.X[i], exception=False) for i in m.X], dtype=np.float32)
512    logger.info("solution pseudo-histogram: %s", np.unique(soln, return_counts=True))
513    soln[~soln.astype(bool)] = 0
514
515    try:
516        slacks = m.capacity[:].slack()
517        logger.info("objective: %s", m.obj(exception=False))
518    except Exception as e:
519        logger.error(e)
520        slacks = [0] * len(cap_cfg)
521
522    if not isinstance(scaled, np.ndarray):
523        scaled = scaled.toarray()
524    vx = np.matmul(scaled.T, soln)
525
526    logger.info("Values per objective:")
527    for f, v in zip(feat_names, vx):
528        logger.info(f"{f}\t\t{v:.4f}")
529
530    if args.plots:
531        fig, ax = plt.subplots(1, 2, figsize=(10, 5))
532        fig.suptitle("solution")
533        ax[0].set_title("positive objectives, abs(w*v*x)")
534        ax[0].pie(np.absolute(vx * values_weights), labels=feat_names, autopct="%1.1f%%")
535
536        cap_ratio = [slacks[i] / itm["cap"] for i, itm in enumerate(cap_cfg)]
537        ax[1].set_title("capacity slack ratios")
538        ax[1].bar([itm["name"] for itm in cap_cfg], cap_ratio)
539
540        # if __name__ == "__main__":
541        #      plt.show()
542        plt.savefig(Path(args.output_raster).parent / "solution.png")
543        plt.close()
544
545    logger.info("Capacity slack:")
546    for i, itm in enumerate(cap_cfg):
547        logger.info(f"{i}: name:{itm['name']} cap:{itm['cap']} sense:{itm['sense']} slack:{slacks[i]}")
548
549    if args.script:
550        instance = {
551            "scaled": scaled,
552            "values_weights": values_weights,
553            "cap_cfg": cap_cfg,
554            "cap_data": cap_data,
555            "feat_names": feat_names,
556            "height": height,
557            "width": width,
558            "nodata_mask": nodata_mask,
559            "all_observations": all_observations,
560            "observations": observations,
561            "pipe": pipe,
562            "nodatas": nodatas,
563        }
564        return soln, m, instance, args
565    else:
566        # 10. WRITE OUTPUTS
567        from fire2a.raster import write_raster
568
569        if args.output_raster:
570            # put soln into the original shape of all_observations using the reversal of nodata_mask
571            data = np.zeros(height * width, dtype=np.float32) - 9999
572            data[~nodata_mask] = soln
573            data = data.reshape(height, width)
574            if not write_raster(
575                data,
576                outfile=str(args.output_raster),
577                nodata=-9999,
578                authid=args.authid,
579                geotransform=args.geotransform,
580                logger=logger,
581            ):
582                logger.error("Error writing output raster")
583                return 1
584    return 0