StreamingDataLoader#

class streaming.StreamingDataLoader(*args, **kwargs)[source]#

A streaming data loader.

Provides an additional checkpoint/resumption interface, for which it tracks the number of samples seen by the model this rank.

Parameters
  • *args – List arguments.

  • **kwargs – Keyword arguments.

load_state_dict(obj)[source]#

Load a dict containing training state (called from non-worker process).

This is called on each copy of the dataset when resuming.

Parameters

obj (Dict[str, Any]) – The state.

state_dict()[source]#

Get a dict containing training state (called from non-worker process).

This is called on rank zero.

Parameters

samples_in_epoch (int) – The number of samples processed so far in the current epoch.

Returns

Optional[Dict[str, Any]] – The state, if a streaming dataset.