upkie 6.1.0
Open-source wheeled biped robots
Loading...
Searching...
No Matches
upkie.envs.wrappers.observation_based_reward.ObservationBasedReward Class Reference

Redefine the reward of an environment as a function of its observation. More...

Public Member Functions

def __init__ (self, gym.Env[ObsType, ActType] env)
 Constructor for the Reward wrapper. More...
 
Tuple[ObsType, SupportsFloat, bool, bool, Dict[str, Any]] step (self, ActType action)
 Modifies the :attr:env reward using :meth:self.reward.
 
SupportsFloat reward (self, ObsType observation, dict info)
 Returns the new environment reward. More...
 

Detailed Description

Redefine the reward of an environment as a function of its observation.

If you would like to redefine the reward that is returned by the base environment before passing it to learning code, you can simply inherit from this class and overwrite the :meth:reward method. See also the RewardWrapper class defined in Gymnasium.

Rewards in reinforcement learning are often defined as \(r(s_t, a_t, s_{t+1})\). With this wrapper, the reward function is redefined as \(r(o_{t+1})\), with two differences:

  • We use the observation rather than the environment state.
  • The new reward is computed based on the post-step observation.

Using observation rather than state does not have a big impact when training in simulation, as simulation sensors can read any part of the full simulation state on demand.

Constructor & Destructor Documentation

◆ __init__()

def upkie.envs.wrappers.observation_based_reward.ObservationBasedReward.__init__ (   self,
gym.Env[ObsType, ActType]  env 
)

Constructor for the Reward wrapper.

Parameters
[in]envEnvironment to be wrapped.

Member Function Documentation

◆ reward()

SupportsFloat upkie.envs.wrappers.observation_based_reward.ObservationBasedReward.reward (   self,
ObsType  observation,
dict  info 
)

Returns the new environment reward.

Parameters
[in]observationLatest observation from the environment.
[in]infoLatest info dictionary from the environment.
Returns
The modified reward.

The documentation for this class was generated from the following file: