Base Upkie environment where actions command servomotors directly. More...

Public Member Functions
None	__init__ (self, Optional[float] frequency=200.0, bool frequency_checks=True, Optional[RobotState] init_state=None, bool regulate_frequency=True, str shm_name="/upkie", Optional[dict] spine_config=None)
	Initialize environment. More...

def	__del__ (self)
	Stop the spine when deleting the environment instance.

None	close (self)
	Stop the spine properly.

Optional[float]	dt (self)
	Regulated period of the control loop in seconds, or `None` if there is no loop frequency regulation.

Optional[float]	frequency (self)
	Regulated frequency of the control loop in Hz, or `None` if there is no loop frequency regulation.

dict	get_neutral_action (self)
	Get the neutral action where servos don't move. More...

None	update_init_rand (self, **kwargs)
	Update initial-state randomization. More...

Tuple[np.ndarray, dict]	reset (self, *Optional[int] seed=None, Optional[dict] options=None)
	Resets the spine and get an initial observation. More...

Tuple[np.ndarray, float, bool, bool, dict]	step (self, dict action)
	Run one timestep of the environment's dynamics. More...

None	log (self, str name, Any entry)
	Log a new entry to the "log" key of the action dictionary. More...

dict	get_bullet_action (self)
	Get the Bullet action that will be applied at next step. More...

None	set_bullet_action (self, dict bullet_action)
	Prepare for the next step an extra action for the Bullet spine. More...

Public Attributes
	action_space
	Action space.

	observation_space
	Observation space.

	init_state
	Initial state for the floating base of the robot, which may be randomized upon resets.

	model
	Robot model read from its URDF description.

Static Public Attributes
int	version = 5
	Environment version number.

Detailed Description

Base Upkie environment where actions command servomotors directly.

Actions and observations correspond to the moteus servo API. Under the hood, the environment provides a number of features:

Communication with the spine process.
Initial state randomization (e.g. when training a policy).
Loop frequency regulation (optional).

Note that Upkie environments are made to run on a single CPU thread. The downside for reinforcement learning is that computations are not massively parallel. The upside is that it simplifies deployment to the real robot, as it relies on the same spine interface that runs on real robots.

Action space

The action space is a dictionary with one key for each servo:

left_hip: left hip joint (qdd100)
left_knee: left knee joint (qdd100)
left_wheel: left wheel joint (mj5208)
right_hip: right hip joint (qdd100)
right_knee: right knee joint (qdd100)
right_wheel: right wheel joint (mj5208)

The value for each servo dictionary is itself a dictionary with the following keys:

position: commanded joint angle \(\theta^*\) in [rad] (NaN to disable) (required).
velocity: commanded joint velocity \(\dot{\theta}^*\) in [rad] / [s] (required).
feedforward_torque: feedforward joint torque \(\tau_{\mathit{ff}}\) in [N m].
kp_scale: scaling factor \(k_{p}^{\mathit{scale}}\) applied to the position feedback gain, between zero and one.
kd_scale: scaling factor \(k_{d}^{\mathit{scale}}\) applied to the velocity feedback gain, between zero and one.
maximum_torque: maximum joint torque \(\tau_{\mathit{max}}\) (feedforward + feedback) enforced during the whole actuation step, in [N m].

The resulting torque applied by the servo is then:

\[ \begin{align*} \tau & = \underset{ [-\tau_{\mathit{max}}, +\tau_{\mathit{max}}]}{ \mathrm{clamp} } \left( \tau_{\mathit{ff}} + k_{p} k_{p}^{\mathit{scale}} (\theta^* - \theta) + k_{d} k_{d}^{\mathit{scale}} (\dot{\theta}^* - \dot{\theta})) \right) \end{align*} \]

Position and velocity gains \(k_{p}\) and \(k_{d}\) are configured in each moteus controller directly and don't change during execution. We can rather modulate the overall feedback gains via the normalized parameters \(k_{p}^{\mathit{scale}} \in [0, 1]\) and \(k_{d}^{\mathit{scale}} \in [0, 1]\). Note that the servo regulates the torque above at its own frequency, which is higher (typically 40 kHz) than the agent and the spine frequencies. See the moteus reference for more details.

Observation space

The observation space is a dictionary with one key for each servo. The value for each key is a dictionary with keys:

position: Joint angle in [rad].
velocity: Joint velocity in [rad] / [s].
torque: Joint torque in [N m].
temperature: Servo temperature in degree Celsius.
voltage: Power bus voltage of the servo, in [V].

As with all Upkie environments, full observations from the spine (detailed in Observations) are also available in the info dictionary returned by the reset and step functions.

Constructor & Destructor Documentation

◆ init()

None upkie.envs.upkie_servos.UpkieServos.__init__	(		self,
		Optional[float]	frequency = `200.0`,
		bool	frequency_checks = `True`,
		Optional[RobotState]	init_state = `None`,
		bool	regulate_frequency = `True`,
		str	shm_name = `"/upkie"`,
		Optional[dict]	spine_config = `None`
	)

Initialize environment.

Parameters

frequency	Regulated frequency of the control loop, in Hz. Can be prescribed even when `regulate_frequency` is unset, in which case `self.dt` will be defined but the loop frequency will not be regulated.
frequency_checks	If `regulate_frequency` is set and this parameter is true (default), a warning is issued every time the control loop runs slower than the desired `frequency`. Set this parameter to false to disable these warnings.
init_state	Initial state of the robot, only used in simulation.
regulate_frequency	If set (default), the environment will regulate the control loop frequency to the value prescribed in `frequency`.
shm_name	Name of shared-memory file to exchange with the spine.
spine_config	Additional spine configuration overriding the default `upkie.config.SPINE_CONFIG`. The combined configuration dictionary is sent to the spine at every reset.

Exceptions

SpineError If the spine did not respond after the prescribed number of trials.

Member Function Documentation

◆ get_bullet_action()

dict upkie.envs.upkie_servos.UpkieServos.get_bullet_action ( self )

Get the Bullet action that will be applied at next step.

Returns: Upcoming simulator action.

◆ get_neutral_action()

dict upkie.envs.upkie_servos.UpkieServos.get_neutral_action ( self )

Get the neutral action where servos don't move.

Returns: Neutral action where servos don't move.

◆ log()

None upkie.envs.upkie_servos.UpkieServos.log	(		self,
		str	name,
		Any	entry
	)

Log a new entry to the "log" key of the action dictionary.

Parameters

name	Name of the entry.
entry	Dictionary to log along with the actual action.

◆ reset()

Tuple[np.ndarray, dict] upkie.envs.upkie_servos.UpkieServos.reset	(		self,
		*Optional[int]	seed = `None`,
		Optional[dict]	options = `None`
	)

Resets the spine and get an initial observation.

Parameters

seed	Number used to initialize the environment’s internal random number generator.
options	Currently unused.

Returns

observation: Initial vectorized observation, i.e. an element of the environment's observation_space.
info: Dictionary with auxiliary diagnostic information. For Upkie this is the full observation dictionary sent by the spine.

◆ set_bullet_action()

None upkie.envs.upkie_servos.UpkieServos.set_bullet_action	(		self,
		dict	bullet_action
	)

Prepare for the next step an extra action for the Bullet spine.

This extra action can be for instance a set of external forces applied to some robot bodies.

Parameters

bullet_action Action dictionary processed by the Bullet spine.

◆ step()

Tuple[np.ndarray, float, bool, bool, dict] upkie.envs.upkie_servos.UpkieServos.step	(		self,
		dict	action
	)

Run one timestep of the environment's dynamics.

When the end of the episode is reached, you are responsible for calling reset() to reset the environment's state.

Parameters

action Action from the agent.

Returns

observation: Observation of the environment, i.e. an element of its observation_space.
reward: Reward returned after taking the action.
terminated: Whether the agent reached a terminal state, which may be a good or a bad thing. When true, the user needs to call reset().
truncated: Whether the episode is reaching max number of steps. This boolean can signal a premature end of the episode, i.e. before a terminal state is reached. When true, the user needs to call reset().
info: Dictionary with additional information, reporting in particular the full observation dictionary coming from the spine.

◆ update_init_rand()

None upkie.envs.upkie_servos.UpkieServos.update_init_rand	(		self,
		**	kwargs
	)

Update initial-state randomization.

Keyword arguments are forwarded as is to upkie.utils.robot_state_randomization.RobotStateRandomization.update.

The documentation for this class was generated from the following file:

upkie/envs/upkie_servos.py

Public Member Functions

Public Attributes

Static Public Attributes

Detailed Description

Action space

Observation space

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ get_bullet_action()

◆ get_neutral_action()

◆ log()

◆ reset()

◆ set_bullet_action()

◆ step()

◆ update_init_rand()

◆ init()