upkie 6.1.0
Open-source wheeled biped robots
|
Redefine the reward of an environment as a function of its observation. More...
Public Member Functions | |
def | __init__ (self, gym.Env[ObsType, ActType] env) |
Constructor for the Reward wrapper. More... | |
Tuple[ObsType, SupportsFloat, bool, bool, Dict[str, Any]] | step (self, ActType action) |
Modifies the :attr:env reward using :meth:self.reward . | |
SupportsFloat | reward (self, ObsType observation, dict info) |
Returns the new environment reward. More... | |
Redefine the reward of an environment as a function of its observation.
If you would like to redefine the reward that is returned by the base environment before passing it to learning code, you can simply inherit from this class and overwrite the :meth:reward
method. See also the RewardWrapper
class defined in Gymnasium.
Rewards in reinforcement learning are often defined as \(r(s_t, a_t, s_{t+1})\). With this wrapper, the reward function is redefined as \(r(o_{t+1})\), with two differences:
Using observation rather than state does not have a big impact when training in simulation, as simulation sensors can read any part of the full simulation state on demand.
def upkie.envs.wrappers.observation_based_reward.ObservationBasedReward.__init__ | ( | self, | |
gym.Env[ObsType, ActType] | env | ||
) |
Constructor for the Reward wrapper.
[in] | env | Environment to be wrapped. |
SupportsFloat upkie.envs.wrappers.observation_based_reward.ObservationBasedReward.reward | ( | self, | |
ObsType | observation, | ||
dict | info | ||
) |
Returns the new environment reward.
[in] | observation | Latest observation from the environment. |
[in] | info | Latest info dictionary from the environment. |