MDSWriter#
- class streaming.MDSWriter(dirname, columns, compression=None, hashes=None, size_limit=67108864)[source]#
Writes a streaming MDS dataset.
- Parameters
dirname (str) – Local dataset directory.
compression (str, optional) – Optional compression or compression:level. Defaults to
None
.hashes (List[str], optional) – Optional list of hash algorithms to apply to shard files. Defaults to
None
.size_limit (int, optional) – Optional shard size limit, after which point to start a new shard. If None, puts everything in one shard. Defaults to
1 << 26
.
- encode_joint_shard()[source]#
Encode a joint shard out of the cached samples (single file).
- Returns
bytes – File data.