XSVWriter#

class streaming.XSVWriter(dirname, columns, separator, compression=None, hashes=None, size_limit=67108864, newline='\n')[source]#

Writes a streaming XSV dataset.

Parameters
  • dirname (str) – Local dataset directory.

  • columns (Dict[str, str]) – Sample columns.

  • separator (str) – String used to separate columns.

  • compression (str, optional) – Optional compression or compression:level. Defaults to None.

  • hashes (List[str], optional) – Optional list of hash algorithms to apply to shard files. Defaults to None.

  • size_limit (int, optional) – Optional shard size limit, after which point to start a new shard. If None, puts everything in one shard. Defaults to None.

  • newline (str) – Newline character inserted between samples. Defaults to \\n.

encode_sample(sample)[source]#

Encode a sample dict to bytes.

Parameters

sample (Dict[str, Any]) – Sample dict.

Returns

bytes – Sample encoded as bytes.

encode_split_shard()[source]#

Encode a split shard out of the cached samples (data, meta files).

Returns

Tuple[bytes, bytes] – Data file, meta file.

get_config()[source]#

Get object describing shard-writing configuration.

Returns

Dict[str, Any] – JSON object.