upkie 7.0.0
Open-source wheeled biped robots
Loading...
Searching...
No Matches
upkie.envs.upkie_servos.UpkieServos Class Reference

Base Upkie environment where actions command servomotors directly. More...

Public Member Functions

None __init__ (self, Optional[float] frequency=200.0, bool frequency_checks=True, Optional[RobotState] init_state=None, bool regulate_frequency=True, str shm_name="/upkie", Optional[dict] spine_config=None)
 Initialize environment. More...
 
def __del__ (self)
 Stop the spine when deleting the environment instance.
 
None close (self)
 Stop the spine properly.
 
Optional[float] dt (self)
 Regulated period of the control loop in seconds, or None if there is no loop frequency regulation.
 
Optional[float] frequency (self)
 Regulated frequency of the control loop in Hz, or None if there is no loop frequency regulation.
 
dict get_neutral_action (self)
 Get the neutral action where servos don't move. More...
 
None update_init_rand (self, **kwargs)
 Update initial-state randomization. More...
 
Tuple[np.ndarray, dict] reset (self, *Optional[int] seed=None, Optional[dict] options=None)
 Resets the spine and get an initial observation. More...
 
Tuple[np.ndarray, float, bool, bool, dict] step (self, np.ndarray action)
 Run one timestep of the environment's dynamics. More...
 
None log (self, str name, Any entry)
 Log a new entry to the "log" key of the action dictionary. More...
 
dict get_bullet_action (self)
 Get the Bullet action that will be applied at next step. More...
 
None set_bullet_action (self, dict bullet_action)
 Prepare for the next step an extra action for the Bullet spine. More...
 

Public Attributes

 action_space
 Action space.
 
 observation_space
 Observation space.
 
 init_state
 Initial state for the floating base of the robot, which may be randomized upon resets.
 
 model
 Robot model read from its URDF description.
 

Static Public Attributes

int version = 5
 Environment version number.
 

Detailed Description

Base Upkie environment where actions command servomotors directly.

Actions and observations correspond to the moteus servo API. Under the hood, the environment provides a number of features:

  • Communication with the spine process.
  • Initial state randomization (e.g. when training a policy).
  • Loop frequency regulation (optional).

Note that Upkie environments are made to run on a single CPU thread. The downside for reinforcement learning is that computations are not massively parallel. The upside is that it simplifies deployment to the real robot, as it relies on the same spine interface that runs on real robots.

Action space

The action space is a dictionary with one key for each servo:

  • left_hip: left hip joint (qdd100)
  • left_knee: left knee joint (qdd100)
  • left_wheel: left wheel joint (mj5208)
  • right_hip: right hip joint (qdd100)
  • right_knee: right knee joint (qdd100)
  • right_wheel: right wheel joint (mj5208)

The value for each servo dictionary is itself a dictionary with the following keys:

  • position: commanded joint angle \(\theta^*\) in [rad] (NaN to disable) (required).
  • velocity: commanded joint velocity \(\dot{\theta}^*\) in [rad] / [s] (required).
  • feedforward_torque: feedforward joint torque \(\tau_{\mathit{ff}}\) in [N m].
  • kp_scale: scaling factor \(k_{p}^{\mathit{scale}}\) applied to the position feedback gain, between zero and one.
  • kd_scale: scaling factor \(k_{d}^{\mathit{scale}}\) applied to the velocity feedback gain, between zero and one.
  • maximum_torque: maximum joint torque \(\tau_{\mathit{max}}\) (feedforward + feedback) enforced during the whole actuation step, in [N m].

The resulting torque applied by the servo is then:

\[ \begin{align*} \tau & = \underset{ [-\tau_{\mathit{max}}, +\tau_{\mathit{max}}]}{ \mathrm{clamp} } \left( \tau_{\mathit{ff}} + k_{p} k_{p}^{\mathit{scale}} (\theta^* - \theta) + k_{d} k_{d}^{\mathit{scale}} (\dot{\theta}^* - \dot{\theta})) \right) \end{align*} \]

Position and velocity gains \(k_{p}\) and \(k_{d}\) are configured in each moteus controller directly and don't change during execution. We can rather modulate the overall feedback gains via the normalized parameters \(k_{p}^{\mathit{scale}} \in [0, 1]\) and \(k_{d}^{\mathit{scale}} \in [0, 1]\). Note that the servo regulates the torque above at its own frequency, which is higher (typically 40 kHz) than the agent and the spine frequencies. See the moteus reference for more details.

Observation space

The observation space is a dictionary with one key for each servo. The value for each key is a dictionary with keys:

  • position: Joint angle in [rad].
  • velocity: Joint velocity in [rad] / [s].
  • torque: Joint torque in [N m].
  • temperature: Servo temperature in degree Celsius.
  • voltage: Power bus voltage of the servo, in [V].

As with all Upkie environments, full observations from the spine (detailed in Observations) are also available in the info dictionary returned by the reset and step functions.

Constructor & Destructor Documentation

◆ __init__()

None upkie.envs.upkie_servos.UpkieServos.__init__ (   self,
Optional[float]   frequency = 200.0,
bool   frequency_checks = True,
Optional[RobotState]   init_state = None,
bool   regulate_frequency = True,
str   shm_name = "/upkie",
Optional[dict]   spine_config = None 
)

Initialize environment.

Parameters
frequencyRegulated frequency of the control loop, in Hz. Can be prescribed even when regulate_frequency is unset, in which case self.dt will be defined but the loop frequency will not be regulated.
frequency_checksIf regulate_frequency is set and this parameter is true (default), a warning is issued every time the control loop runs slower than the desired frequency. Set this parameter to false to disable these warnings.
init_stateInitial state of the robot, only used in simulation.
regulate_frequencyIf set (default), the environment will regulate the control loop frequency to the value prescribed in frequency.
shm_nameName of shared-memory file to exchange with the spine.
spine_configAdditional spine configuration overriding the default upkie.config.SPINE_CONFIG. The combined configuration dictionary is sent to the spine at every reset.
Exceptions
SpineErrorIf the spine did not respond after the prescribed number of trials.

Reimplemented in upkie.envs.upkie_servo_positions.UpkieServoPositions, and upkie.envs.upkie_servo_torques.UpkieServoTorques.

Member Function Documentation

◆ get_bullet_action()

dict upkie.envs.upkie_servos.UpkieServos.get_bullet_action (   self)

Get the Bullet action that will be applied at next step.

Returns
Upcoming simulator action.

◆ get_neutral_action()

dict upkie.envs.upkie_servos.UpkieServos.get_neutral_action (   self)

Get the neutral action where servos don't move.

Returns
Neutral action where servos don't move.

◆ log()

None upkie.envs.upkie_servos.UpkieServos.log (   self,
str  name,
Any  entry 
)

Log a new entry to the "log" key of the action dictionary.

Parameters
nameName of the entry.
entryDictionary to log along with the actual action.

◆ reset()

Tuple[np.ndarray, dict] upkie.envs.upkie_servos.UpkieServos.reset (   self,
*Optional[int]   seed = None,
Optional[dict]   options = None 
)

Resets the spine and get an initial observation.

Parameters
seedNumber used to initialize the environment’s internal random number generator.
optionsCurrently unused.
Returns
  • observation: Initial vectorized observation, i.e. an element of the environment's observation_space.
  • info: Dictionary with auxiliary diagnostic information. For Upkie this is the full observation dictionary sent by the spine.

◆ set_bullet_action()

None upkie.envs.upkie_servos.UpkieServos.set_bullet_action (   self,
dict  bullet_action 
)

Prepare for the next step an extra action for the Bullet spine.

This extra action can be for instance a set of external forces applied to some robot bodies.

Parameters
bullet_actionAction dictionary processed by the Bullet spine.

◆ step()

Tuple[np.ndarray, float, bool, bool, dict] upkie.envs.upkie_servos.UpkieServos.step (   self,
np.ndarray  action 
)

Run one timestep of the environment's dynamics.

When the end of the episode is reached, you are responsible for calling reset() to reset the environment's state.

Parameters
actionAction from the agent.
Returns
  • observation: Observation of the environment, i.e. an element of its observation_space.
  • reward: Reward returned after taking the action.
  • terminated: Whether the agent reached a terminal state, which may be a good or a bad thing. When true, the user needs to call reset().
  • truncated: Whether the episode is reaching max number of steps. This boolean can signal a premature end of the episode, i.e. before a terminal state is reached. When true, the user needs to call reset().
  • info: Dictionary with additional information, reporting in particular the full observation dictionary coming from the spine.

◆ update_init_rand()

None upkie.envs.upkie_servos.UpkieServos.update_init_rand (   self,
**  kwargs 
)

Update initial-state randomization.

Keyword arguments are forwarded as is to upkie.utils.robot_state_randomization.RobotStateRandomization.update.


The documentation for this class was generated from the following file: