Tab-separated values

From Justapedia, unleashing the power of collective wisdom
(Redirected from Tab-delimited)
Jump to navigation Jump to search
Tab-separated values
Filename extension.tsv, .tab
Internet media type
text/tab-separated-values
Type of formatmultiplatform, serial data streams
Container fordatabase information organized as field separated lists
StandardIANA MIME type

A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure, e.g., a database table or spreadsheet data,[1] and a way of exchanging information between databases.[2] Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab character. The TSV format is thus a variation of the comma-separated values format.

TSV is a simple file format that is widely supported, so it is often used in data exchange to move tabular data between different computer programs that support the format. For example, a TSV file might be used to transfer information from a database program to a spreadsheet.

The IANA standard for TSV[2] achieves simplicity by simply disallowing tabs within fields.

Example

The head of the Iris flower data set can be stored as a TSV using the following plain text (note that the HTML rendering may convert tabs to spaces):

Sepal length	Sepal width	Petal length	Petal width	Species
5.1	3.5	1.4	0.2	I. setosa
4.9	3.0	1.4	0.2	I. setosa
4.7	3.2	1.3	0.2	I. setosa
4.6	3.1	1.5	0.2	I. setosa
5.0	3.6	1.4	0.2	I. setosa

The TSV plain text above corresponds to the following tabular data:

Sepal length Sepal width Petal length Petal width Species
5.1 3.5 1.4 0.2 I. setosa
4.9 3.0 1.4 0.2 I. setosa
4.7 3.2 1.3 0.2 I. setosa
4.6 3.1 1.5 0.2 I. setosa
5.0 3.6 1.4 0.2 I. setosa

Conventions for lossless conversion to TSV

Since the values in the TSV format cannot contain literal tabs or newline characters, a convention is necessary for lossless conversion of text values with these characters. A common convention is to perform the following escapes:[3][4]

   \n for newline,
   \t for tab,
   \r for carriage return,
   \\ for backslash.

Another common convention is to use the CSV convention from RFC 4180 and enclose these special characters in double quotes. This can lead to ambiguities.

Another ambiguity is whether records are separated by newlines, as would be typical for lines on UNIX, or a carriage return followed by a newline, as would be typical for Microsoft platforms. Many programs such as LibreOffice expect a carriage return followed by a newline.

See also

References

  1. ^ How To Use Tab Separated Value (TSV) Files Published by the International Monetary Fund
  2. ^ a b "Definition of tab-separated-values (tsv)". Internet Assigned Numbers Authority (IANA).
  3. ^ "Linear TSV". Data Protocols - Open Knowledge Foundation.
  4. ^ "jq Manual". stedolan.github.io.

Bibliography

External links