README.md

# Anti-Poaching Environment (APE)

This is the repository for APE, the Anti-Poaching Environment. This is a mixed, zero-sum and multi-agent game between independent poachers and cooperative rangers on a grid. The main implementation can be found at [anti_poaching.py](anti_poaching/env/anti_poaching.py), where it is implemented as a PettingZoo environment. Examples that use this environment are found in the (examples)(examples/) directory. Notably, this includes the RLlib (currently supported at v2.8.0) interface in the [rllib](examples/rllib/) folder.

![2R2P instance of APE with pCA random](examples/manual_policies/gifs/random_pca.gif) 

## Installation

To have a ready-to-go environment created for you, use virtualenv (or similar tools of your choice) to create a python virtual environment. We currently test with python3.8, but later versions should also work.

```bash
$ virtualenv -p python3.8 ape; 
$ source ape/bin/activate;
```

To install the environment with a GPU-enabled version of pytorch enabled, you can supply the `full` option as follows from the root directory of this project. This will install the environment as an editable package using `pip`.
```bash
$ pip  install -e .[code,gpu] # For GPU-enabled torch
```

Alternatively, to install only the CPU version of PyTorch, use 
```bash
$ pip  install -e .[code,cpu] # For CPU-only torch
```

We also provide a simple script that does this automatically for you, as `init.sh`. You can simply source this script as follows:
```bash
$ source init.sh # For CPU-only torch
$ source init.sh full # For GPU-enabled torch
```

## Using APE 

The main environment is implemented in [anti_poaching.py](./anti_poaching/env/anti_poaching.py), following the PettingZoo API. Once the package is installed (see previous section), the following code should run:

```python
from anti_poaching.anti_poaching_v0 import anti_poaching
  
cg = anti_poaching.parallel_env(render_mode="rgb")
done, observations, terminations, truncations = False, None, None, None
action_mask = {
    agent: cg.grid.permitted_movements(agent) for agent in cg.agents
}

while not done:
    # sample the actions for each agent randomly
    actions = {
        agent: cg.action_space(agent).sample(mask=action_mask[agent])
        for agent in cg.agents
    }
    observations, _, terminations, truncations, _ = cg.step(actions)
    action_mask = {
        agent: observations[agent]["action_mask"] for agent in cg.agents
    }
    done = all(
        x or y for x, y in zip(terminations.values(), truncations.values())
    )
    cg.render()
```

Alternatively, try running the examples from [manual_policies](./examples/manual_policies/), or running the test suite using `pytest` as follows
```bash
$ pytest [tests/]
```

## Examples and the RLlib Interface

A few examples are found in the [examples](examples/) folder. 

### Manual policies

The [fixed_policy.py](examples/manual_policies/fixed_policy.py) and the [random_policy.py](examples/manual_policies/random_policy.py) show how to use the game using hand-coded policies, or just to show the basic RL loop.

### Rllib examples

The examples run MARL algorithms (Policy Gradients, PPO, QMIX) on the developed model using RLlib. All experiments can be launched the central script [main.py](examples/rllib/main.py). This runs an RLlib algorithm (PPO) in Multi-Agent Independent Learning mode for an `AntiPoachingGame` instance by default. All examples have parameters that can be specified via command line (use --help to see all options); everything is wrapped to provide compatibility with RLlib.

```bash
# from the repository root
$ cd examples/rllib
$ python main.py
```

To see all the configuration options possible, run
```bash
$ python main.py --help
```

For example, to run a 2 Rangers vs. 4 Poachers scenario where

- The game is played on a 15x15 grid
- only the Rangers learn, while the Poachers use the `Random` heuristic,
- and the learning is over 30k steps, but is evaluated every 10k steps,
- over 20 available CPU cores

we can run the following line of code:
```bash
$ python main.py --grid 15 --rangers 2 --poachers 4 
           --policies-train r --ppol random 
           --timesteps 30000 --eval-every 10000
           --num-cpus 20
```
For further details, refer the [README](examples/rllib/README.md) for the RLlib interface.