structa.source
- class structa.source.Source(source, *, encoding='auto', encoding_strict=True, format='auto', csv_delimiter='auto', csv_quotechar='auto', yaml_safe=True, json_strict=True, sample_limit=1048576)[source]
A generalized data source capable of automatically recognizing certain popular data formats, and guessing character encodings. Constructed with a mandatory file-like object as the source, and a multitude of keyword-only options, the decoded content can be access from
data
The source must have a
read()
method which, given a number of bytes to return, returns abytes
string up to that length, but has no requirements beyond this. Note that this means files over sockets or pipes are acceptable inputs.- Parameters
source (file) – The file-like object to decode (must have a
read
method).encoding (str) – The character encoding used in the source, or “auto” (the default) if it should be guessed from a sample of the data.
encoding_strict (bool) – If
True
(the default), raise an exception if character decoding errors occur. Otherwise, replace invalid characters silently.format (str) – If “auto” (the default), guess the format of the data source. Otherwise can be explicitly set to “csv”, “yaml”, or “json” to force parsing of that format.
csv_delimiter (str) – If “auto” (the default), attempt to guess the field delimiter when the “csv” format is being decoded using the
csv.Sniffer
class. Comma, semi-colon, space, and tab characters will be attempted. Otherwise must be set to the single characterstr
used as the field delimiter (e.g. “,”).csv_quotechar (str) – If “auto” (the default), attempt to guess the string delimiter when the “csv” format is being decoded using the
csv.Sniffer
class. Otherwise must be set to the single characterstr
used as the string delimiter (e.g. ‘”’).yaml_safe (bool) – If
True
(the default) the “safe” YAML parser from ruamel.yaml will be used.json_strict (bool) – If
True
(the default), control characters will not be permitted inside decoded strings.sample_limit (int) – The number of bytes to sample from the beginning of the stream when attempting to determine character encoding. Defaults to 1MB.
- property csv_dialect
The
csv.Dialect
used whenformat
is “csv”, orNone
otherwise.
- property data
The decoded data. Typically a
list
ordict
of values, but can be any value representable in the source format.
- property encoding
The character encoding detected or specified for the source, e.g. “utf-8”.
- property format
The data format detected or specified for the source, e.g. “csv”, “yaml”, or “json”.