streaming.text#

Natively supported NLP datasets.

Classes

StreamingC4

Implementation of the C4 (Colossal Cleaned Common Crawl) dataset using StreamingDataset.

StreamingEnWiki

Implementation of the English Wikipedia 2020-01-01 streaming dataset.

StreamingPile

Implementation of the the Pile using StreamingDataset.