Reinforcement Learning#

This example displays how to use reinforcement learning (RL) to train a policy to control a simple microgrid. We will train and deploy a simple Discrete Q-Network (DQN) policy on one of the pymgrid25 benchmark microgrids.

Algorithms for reinforcement learning are not built into pymgrid, nor are they a dependency. We recommend using one of RLlib and garage; RLlib is better supported and has a wider variety of algorithms but can be less developer-friendly in some scenarios. This example will use garage; the API for RLlib is similar.

To install garage, see the garage documentation.

[1]:
import pandas as pd

from pymgrid.envs import DiscreteMicrogridEnv

Defining the Environment#

Defining an RL environment is extremely straightforward. To define an environment on one of the benchmark microgrids, we simply call from_scenario on our choice of the DiscreteMicrogridEnv and the ContinuousMicrogridEnv.

Here, we will use the discrete environment and train a DQN on it.

[2]:
env = DiscreteMicrogridEnv.from_scenario(microgrid_number=0)

Environments subclass pymgrid.Microgrid and thus have the same attribute and logging functionality:

[3]:
for module in env.modules.module_list():
    print(f'{module}\n')
LoadModule(time_series=<class 'numpy.ndarray'>, forecaster=OracleForecaster, forecast_horizon=23, forecaster_increase_uncertainty=False, raise_errors=False)

RenewableModule(time_series=<class 'numpy.ndarray'>, raise_errors=False, forecaster=OracleForecaster, forecast_horizon=23, forecaster_increase_uncertainty=False, provided_energy_name=renewable_used)

UnbalancedEnergyModule(raise_errors=False, loss_load_cost=10, overgeneration_cost=1)

BatteryModule(min_capacity=290.40000000000003, max_capacity=1452, max_charge=363, max_discharge=363, efficiency=0.9, battery_cost_cycle=0.02, battery_transition_model=None, init_charge=None, init_soc=0.2, raise_errors=False)

GridModule(max_import=1920, max_export=1920)

Setting Up the RL Algorithm#

As we mentioned, we are planning on deploying a simple DQN in this case.

For ease of use, we will employ a simple LocalSampler that does not parallelize sampling. We will also use an EpsilonGreedyPolicy for exploration.

[6]:
from garage.experiment.deterministic import set_seed

from garage.np.exploration_policies import EpsilonGreedyPolicy

from garage.replay_buffer import PathBuffer

from garage.sampler import LocalSampler, RaySampler

from garage.torch.algos.dqn import DQN
from garage.torch.policies import DiscreteQFArgmaxPolicy
from garage.torch.q_functions import DiscreteMLPQFunction

from garage.trainer import Trainer

Remainder Coming Soon.

[ ]: