minerl.data
The minerl.data
package provides a unified interface for
sampling data from the MineRL-v0 Dataset. Data is accessed by
making a dataset from one of the minerl environments and iterating
over it using one of the iterators provided by the minerl.data.DataPipeline
The following is a description of the various methods included within the package as well as some basic usage examples. To see more detailed descriptions and tutorials on how to use the data API, please take a look at our numerous getting started manuals.
MineRLv0
- class minerl.data.DataPipeline(data_directory: <module 'posixpath' from '/home/docs/checkouts/readthedocs.org/user_builds/minerl/envs/v0.4.4/lib/python3.7/posixpath.py'>, environment: str, num_workers: int, worker_batch_size: int, min_size_to_dequeue: int, random_seed=42)
Bases:
object
Creates a data pipeline object used to itterate through the MineRL-v0 dataset
- property action_space
action space of current MineRL environment
- Type
Returns
- batch_iter(batch_size: int, seq_len: int, num_epochs: int = - 1, preload_buffer_size: int = 2, seed: Optional[int] = None)
Returns batches of sequences length SEQ_LEN of the data of size BATCH_SIZE. The iterator produces batches sequentially. If an element of a batch reaches the end of its episode, it will be appended with a new episode.
If you wish to obtain metadata of the episodes, consider using load_data instead.
- Parameters
batch_size (int) – The batch size.
seq_len (int) – The size of sequences to produce.
num_epochs (int, optional) – The number of epochs to iterate over the data. Defaults to -1.
preload_buffer_size (int, optional) – Increase to IMPROVE PERFORMANCE. The data iterator uses a queue to prevent blocking, the queue size is the number of trajectories to load into the buffer. Adjust based on memory constraints. Defaults to 32.
seed (int, optional) – [int]. NOT IMPLEMENTED Defaults to None.
- Returns
A generator that yields (sarsd) batches
- Return type
Generator
- get_trajectory_names()
Gets all the trajectory names
- Returns
[description]
- Return type
A list of experiment names
- load_data(stream_name: str, skip_interval=0, include_metadata=False, include_monitor_data=False)
Iterates over an individual trajectory named stream_name.
- Parameters
stream_name (str) – The stream name desired to be iterated through.
skip_interval (int, optional) – How many sices should be skipped.. Defaults to 0.
include_metadata (bool, optional) – Whether or not meta data about the loaded trajectory should be included.. Defaults to False.
include_monitor_data (bool, optional) – Whether to include all of the monitor data from the environment. Defaults to False.
- Yields
A tuple of (state, player_action, reward_from_action, next_state, is_next_state_terminal). These are tuples are yielded in order of the episode.
- property observation_space
observation space of current MineRL environment
- Type
Returns
- static read_frame(cap)
- sarsd_iter(num_epochs=- 1, max_sequence_len=32, queue_size=None, seed=None, include_metadata=False)
Returns a generator for iterating through (state, action, reward, next_state, is_terminal) tuples in the dataset. Loads num_workers files at once as defined in minerl.data.make() and return up to max_sequence_len consecutive samples wrapped in a dict observation space
- Parameters
num_epochs (int, optional) – number of epochs to iterate over or -1 to loop forever. Defaults to -1
max_sequence_len (int, optional) – maximum number of consecutive samples - may be less. Defaults to 32
seed (int, optional) – seed for random directory walk - note, specifying seed as well as a finite num_epochs will cause the ordering of examples to be the same after every call to seq_iter
queue_size (int, optional) – maximum number of elements to buffer at a time, each worker may hold an additional item while waiting to enqueue. Defaults to 16*self.number_of_workers or 2* self.number_of_workers if max_sequence_len == -1
include_metadata (bool, optional) – adds an additional member to the tuple containing metadata about the stream the data was loaded from. Defaults to False
- Yields
A tuple of (state, player_action, reward_from_action, next_state, is_next_state_terminal, (metadata)). Each element is in the format of the environment action/state/reward space and contains as many samples are requested.
- seq_iter(num_epochs=- 1, max_sequence_len=32, queue_size=None, seed=None, include_metadata=False)
DEPRECATED METHOD FOR SAMPLING DATA FROM THE MINERL DATASET.
This function is now
DataPipeline.batch_iter()
- property spec: EnvSpec
- minerl.data.download(directory: Optional[str] = None, environment: Optional[str] = None, competition: Optional[str] = None, resolution: str = 'low', texture_pack: int = 0, update_environment_variables: bool = True, disable_cache: bool = False) None
Low-level interface for downloading MineRL dataset.
Using the python -m minerl.data.download CLI script is preferred because it performs more input validation and hides internal-use arguments.
Run this command with environment=None and competition=None to download a minimal dataset with 2 demonstrations from each environment. Provide the environment or competition arguments to download a full dataset for a particular environment or competition.
- Parameters
directory – Destination folder for downloading MineRL datasets. If None, then use the MINERL_DATA_ROOT environment variable, or error if this environment variable is not set.
environment –
The name of a MineRL environment or None. If this argument is the name of a MineRL environment and competition is None, then this function downloads the full dataset for the specifies MineRL environment.
If both environment=None and competition=None, then this function downloads a minimal dataset.
competition –
The name of a MineRL competition (“diamond” or “basalt”) or None. If this argument is the name of a MineRL environment and competition is None, then this function downloads the full dataset for the specified MineRL competition.
If both environment=None and competition=None, then this function downloads a minimal dataset.
resolution – For internal use only. One of [‘low’, ‘high’] corresponding to video resolutions of [64x64,1024x1024] respectively (note: high resolution is not currently supported).
texture_pack – For internal use only. 0: default Minecraft texture pack, 1: flat semi-realistic texture pack.
update_environment_variables – For internal use only. If True, then export of MINERL_DATA_ROOT environment variable (note: for some os this is only for the current shell).
disable_cache –
If False (default), then the tar download and other temporary download files are saved inside directory.
If disable_cache is False on a future call to this function and temporary download files are detected, then the download is resumed from previous download progress. If disable_cache is False on a future call to this function and the completed tar file is detected, then the download is skipped entirely and we immediately extract the tar to directory.
- minerl.data.make(environment=None, data_dir=None, num_workers=4, worker_batch_size=32, minimum_size_to_dequeue=32, force_download=False)
Initalizes the data loader with the chosen environment
- Parameters
environment (string) – desired MineRL environment
data_dir (string, optional) – specify alternative dataset location. Defaults to None.
num_workers (int, optional) – number of files to load at once. Defaults to 4.
force_download (bool, optional) – specifies whether or not the data should be downloaded if missing. Defaults to False.
- Returns
initalized data pipeline
- Return type