◆ step()

Tuple[np.ndarray, float, bool, bool, dict] step	(		self,
		np.ndarray	action
	)

Run one timestep of the environment's dynamics.

When the end of the episode is reached, you are responsible for calling reset() to reset the environment's state.

Parameters

action Action from the agent.

Returns

observation: Observation of the environment, i.e. an element of its observation_space.
reward: Reward returned after taking the action.
terminated: Whether the agent reached a terminal state, which may be a good or a bad thing. When true, the user needs to call reset().
truncated: Whether the episode is reaching max number of steps. This boolean can signal a premature end of the episode, i.e. before a terminal state is reached. When true, the user needs to call reset().
info: Dictionary with additional information, reporting in particular the full observation dictionary coming from the spine.