minerl.data

The minerl.data package provides a unified interface for sampling data from the MineRL-v0 Dataset. Data is accessed by making a dataset from one of the minerl environments and iterating over it using one of the iterators provided by the minerl.data.DataPipeline

The following is a description of the various methods included within the package as well as some basic usage examples. To see more detailed descriptions and tutorials on how to use the data API, please take a look at our numerous getting started manuals.

MineRLv0

class minerl.data.DataPipeline(data_directory: <module 'posixpath' from '/home/docs/checkouts/readthedocs.org/user_builds/minerl/envs/latest/lib/python3.7/posixpath.py'>, environment: str, num_workers: int, worker_batch_size: int, min_size_to_dequeue: int, random_seed=42)

Bases: object

Creates a data pipeline object used to itterate through the MineRL-v0 dataset

property action_space

action space of current MineRL environment

Type

Returns

batch_iter(batch_size: int, seq_len: int, num_epochs: int = - 1, preload_buffer_size: int = 2, seed: Optional[int] = None)

Returns batches of sequences length SEQ_LEN of the data of size BATCH_SIZE. The iterator produces batches sequentially. If an element of a batch reaches the end of its episode, it will be appended with a new episode.

If you wish to obtain metadata of the episodes, consider using load_data instead.

Parameters
  • batch_size (int) – The batch size.

  • seq_len (int) – The size of sequences to produce.

  • num_epochs (int, optional) – The number of epochs to iterate over the data. Defaults to -1.

  • preload_buffer_size (int, optional) – Increase to IMPROVE PERFORMANCE. The data iterator uses a queue to prevent blocking, the queue size is the number of trajectories to load into the buffer. Adjust based on memory constraints. Defaults to 32.

  • seed (int, optional) – [int]. NOT IMPLEMENTED Defaults to None.

Returns

A generator that yields (sarsd) batches

Return type

Generator

get_trajectory_names()

Gets all the trajectory names

Returns

[description]

Return type

A list of experiment names

load_data(stream_name: str, skip_interval=0, include_metadata=False, include_monitor_data=False)

Iterates over an individual trajectory named stream_name.

Parameters
  • stream_name (str) – The stream name desired to be iterated through.

  • skip_interval (int, optional) – How many sices should be skipped.. Defaults to 0.

  • include_metadata (bool, optional) – Whether or not meta data about the loaded trajectory should be included.. Defaults to False.

  • include_monitor_data (bool, optional) – Whether to include all of the monitor data from the environment. Defaults to False.

Yields

A tuple of (state, player_action, reward_from_action, next_state, is_next_state_terminal). These are tuples are yielded in order of the episode.

property observation_space

observation space of current MineRL environment

Type

Returns

static read_frame(cap)
sarsd_iter(num_epochs=- 1, max_sequence_len=32, queue_size=None, seed=None, include_metadata=False)

Returns a generator for iterating through (state, action, reward, next_state, is_terminal) tuples in the dataset. Loads num_workers files at once as defined in minerl.data.make() and return up to max_sequence_len consecutive samples wrapped in a dict observation space

Parameters
  • num_epochs (int, optional) – number of epochs to iterate over or -1 to loop forever. Defaults to -1

  • max_sequence_len (int, optional) – maximum number of consecutive samples - may be less. Defaults to 32

  • seed (int, optional) – seed for random directory walk - note, specifying seed as well as a finite num_epochs will cause the ordering of examples to be the same after every call to seq_iter

  • queue_size (int, optional) – maximum number of elements to buffer at a time, each worker may hold an additional item while waiting to enqueue. Defaults to 16*self.number_of_workers or 2* self.number_of_workers if max_sequence_len == -1

  • include_metadata (bool, optional) – adds an additional member to the tuple containing metadata about the stream the data was loaded from. Defaults to False

Yields

A tuple of (state, player_action, reward_from_action, next_state, is_next_state_terminal, (metadata)). Each element is in the format of the environment action/state/reward space and contains as many samples are requested.

seq_iter(num_epochs=- 1, max_sequence_len=32, queue_size=None, seed=None, include_metadata=False)

DEPRECATED METHOD FOR SAMPLING DATA FROM THE MINERL DATASET.

This function is now DataPipeline.batch_iter()

property spec: minerl.herobraine.env_spec.EnvSpec
minerl.data.download(directory: Optional[str] = None, environment: Optional[str] = None, competition: Optional[str] = None, resolution: str = 'low', texture_pack: int = 0, update_environment_variables: bool = True, disable_cache: bool = False) None

Low-level interface for downloading MineRL dataset.

Using the python -m minerl.data.download CLI script is preferred because it performs more input validation and hides internal-use arguments.

Run this command with environment=None and competition=None to download a minimal dataset with 2 demonstrations from each environment. Provide the environment or competition arguments to download a full dataset for a particular environment or competition.

Parameters
  • directory – Destination folder for downloading MineRL datasets. If None, then use the MINERL_DATA_ROOT environment variable, or error if this environment variable is not set.

  • environment

    The name of a MineRL environment or None. If this argument is the name of a MineRL environment and competition is None, then this function downloads the full dataset for the specifies MineRL environment.

    If both environment=None and competition=None, then this function downloads a minimal dataset.

  • competition

    The name of a MineRL competition (“diamond” or “basalt”) or None. If this argument is the name of a MineRL environment and competition is None, then this function downloads the full dataset for the specified MineRL competition.

    If both environment=None and competition=None, then this function downloads a minimal dataset.

  • resolution – For internal use only. One of [‘low’, ‘high’] corresponding to video resolutions of [64x64,1024x1024] respectively (note: high resolution is not currently supported).

  • texture_pack – For internal use only. 0: default Minecraft texture pack, 1: flat semi-realistic texture pack.

  • update_environment_variables – For internal use only. If True, then export of MINERL_DATA_ROOT environment variable (note: for some os this is only for the current shell).

  • disable_cache

    If False (default), then the tar download and other temporary download files are saved inside directory.

    If disable_cache is False on a future call to this function and temporary download files are detected, then the download is resumed from previous download progress. If disable_cache is False on a future call to this function and the completed tar file is detected, then the download is skipped entirely and we immediately extract the tar to directory.

minerl.data.make(environment=None, data_dir=None, num_workers=4, worker_batch_size=32, minimum_size_to_dequeue=32, force_download=False)

Initalizes the data loader with the chosen environment

Parameters
  • environment (string) – desired MineRL environment

  • data_dir (string, optional) – specify alternative dataset location. Defaults to None.

  • num_workers (int, optional) – number of files to load at once. Defaults to 4.

  • force_download (bool, optional) – specifies whether or not the data should be downloaded if missing. Defaults to False.

Returns

initalized data pipeline

Return type

DataPipeline