Open Data Glossary

Part of our introduction to open data.

Bulk (Data)

Data is available in bulk if it can be obtained with a single (or very few) (machine-automatable) requests. For example, if the data is in bulk if it is in a single (or very few) easily downloadable files.

Conversely, data is not in bulk if there is no quick, simple (and automatable) way to get it. For example, imagine that one can only obtain items from the dataset via a form on a website (this is frequently the case with, for example, company registers). In this case it would take many requests (perhaps millions if the dataset is large) to get the whole dataset. This would then be “non-bulk” provision of data.

Licensing

When data or content is made available it is, by default, restricted in its use by intellectual property rights (or at least there is a strong possibility that it is restricted in this way).

Thus, if you want to use that material, and especially if you want to reuse or redistribute it legally, you need to check the license for that material and see what it allows and does not allow. Be aware that in many cases the site may not have a “license” page or section but will have the conditions as part of the “terms of use” (or similarly named section).

See also: Open Data License

Machine readable

Material (data or content) is machine readable if it is in a format that can be easily processed by a computer.

Non-digital material (for example printed or hand-written documents) is by its non-digital nature not machine-readable. But even digital material need not be machine-readable. For example, consider a PDF document containing tables of data. These are definitely digital but are not machine-readable because a computer would struggle to access the tabular information (even though they are very human readable!). The equivalent tables in a format such as a spreadsheet would be machine readable.

As another example scans (photographs) of text are not machine-readable (but are human readable!) but the equivalent text in a format such as a simple ASCII text file or a text-processing format such as Microsoft Word file is machine readable.

Note: The appropriate machine readable format may vary by type – so, for example, machine readable form for geographic data may be different than for tabular data.

Open Data

Open data is data made available in accordance with the Open Definition, specifically:

  • Available under an open (data) license (permitting anyone freely to access, reuse and redistribute)
  • Available in [machine-readable] form and “as a whole” (bulk). This is to avoid access and use being limited (intentionally or unintentionally) by indirect means e.g. by providing the data on paper or by only allowing access to a few items of a dataset at a time

More detail in the [full Open Definition of open data].

Open Data License

A license which is conformant with the Open Definition — and therefore one which permits anyone to use, reuse or redistribute the licensed data (subject at most to a requirement to attribute or sharealike).