Knowledge Engineering¶

Wikipedia: https://en.wikipedia.org/wiki/Knowledge_engineering
Wikipedia: https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Knowledge
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Graph_theory
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Ontology
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Ontology_(information_science)

Symbols¶

Wikipedia: https://en.wikipedia.org/wiki/Symbol

WikipediaCategory: https://en.wikipedia.org/wiki/Category:Symbols

Character encoding¶

Wikipedia: https://en.wikipedia.org/wiki/Character_encoding

WikipedaCategory: https://en.wikipedia.org/wiki/Category:Character_encoding

https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings

Control Characters¶

Wikipedia: https://en.wikipedia.org/wiki/Control_character

ASCII Control Characters

https://en.wikipedia.org/wiki/Control_character#In_ASCII
Unicode Control Characters

https://en.wikipedia.org/wiki/Unicode_control_characters

Warning

Control characters are often significant.

Common security errors involving control characters:

https://cwe.mitre.org/data/definitions/74.html

CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component (‘Injection’)
- https://cwe.mitre.org/data/definitions/93.html
  
  CWE-93: Improper Neutralization of CRLF Sequences (‘CRLF Injection’)
```
x = "line1_start"
x2 = "thing\r\n\0line1_end"
x = x + x2
x = x + "line2...line2_end\n"
records = x.splitlines()  # ! error
```
https://cwe.mitre.org/data/definitions/140.html

CWE-140: Improper Neutralization of Delimiters
- https://cwe.mitre.org/data/definitions/141.html
  
  CWE-141: Improper Neutralization of Parameter/Argument Delimiters
- https://cwe.mitre.org/data/definitions/142.html
  
  CWE-142: Improper Neutralization of Value Delimiters
- https://cwe.mitre.org/data/definitions/143.html
  
  CWE-143: Improper Neutralization of Record Delimiters
- https://cwe.mitre.org/data/definitions/144.html
  
  CWE-144: Improper Neutralization of Line Delimiters
- https://cwe.mitre.org/data/definitions/145.html
  
  CWE-145: Improper Neutralization of Section Delimiters

Escape Sequences¶

Wikipedia: https://en.wikipedia.org/wiki/Escape_sequence

https://en.wikipedia.org/wiki/Escape_sequences_in_C#Table_of_escape_sequences
https://en.wikipedia.org/wiki/CDATA
- https://en.wikipedia.org/wiki/CDATA#Nesting
XML, HTML & escape sequences:
```
&
```
< > /> ” <!– –> <![CDATA[ ]]>

# HTML & Templates <p id=”{{attr}}”>text</p> # attr=’here”s one’

Python escape sequences:

s =   "Here's one"
s =   'Here\'s one'
s = '''Here's one'''
s =   'Here\N{APOSTROPHE}s one'
s =   'Here'"'s"' one'

Bash escape sequences:

s1="$Here's one"
s1="${Here}'s one"
s2='${Here}\'s one'  # ! error
s2='${Here}'"'s"' one'
s3=""$Here"'s one"
s3=""${Here}"'s one"

ASCII¶

Wikipedia: https://en.wikipedia.org/wiki/ASCII

ASCII (American Standard Code for Information Exchance) defines 128 characters.

Python:

from __future__ import print_function
for i in range(0,128):
    print("{0:<3d} {1!r} {1:s}.".format(i, chr(i)))

Unicode¶

Wikipedia: https://en.wikipedia.org/wiki/Unicode

Wikipedia: https://en.wikipedia.org/wiki/Unicode_symbols

https://en.wikipedia.org/wiki/Unicode_symbols#Symbol_block_list

Entering Unicode Symbols:

https://en.wikipedia.org/wiki/Unicode_input#Hexadecimal_code_input

https://en.wikipedia.org/wiki/Unicode_input#Hexadecimal_code_input

∴ – Therefore – u+2234

X11: ctrl-shift-u 2234
Vim: ctrl-v u2234

Python:

Python 3 Unicode HOWTO: https://docs.python.org/3/howto/unicode.html
Python 2 Unicode HOWTO: https://docs.python.org/2/howto/unicode.html

c1 = u'∴'  # Python 2.6-3.2, 3.4+
c2 =  '∴'  # Python 3.0+
c3 = '\N{THEREFORE}' # howto/unicode#the-string-type glyph name
u1 = unichr(0x2234)  # Python 2+
u2 =    chr(0x2234)  # Python 3.0+
from builtins import chr  # Python 2 & 3
u3 =    chr(0x2234)       # Python 2 & 3
u4 =    chr(8756)    # int(hex(8756)[2:], 16) == 8756 (0x2234)
chars = [c1, c2, u1, u2, u3, u4]
from operator import eq
assert all((eq(x, chars[0]) for x in chars))

Python and UTF-8:

Python 2 Codecs docs: https://docs.python.org/2/library/codecs.html
https://pymotw.com/2/codecs/

e.g. JSON with UTF-8:

# Read an assumed UTF-8 encoded JSON file with Python 2+, 3+
import codecs
with codecs.open('filename.json', encoding='utf8') as file_:
    text = file_.read()

Unicode encodings:

UTF-1
UTF-5
UTF-6
UTF-8
UTF-9, UTF-18
UTF-16
UTF-32

UTF-8¶

Wikipedia: https://en.wikipedia.org/wiki/UTF-8

UTF-8 is a Unicode Character encoding which can represent all Unicode symbols with 8-bit code units.

https://en.wikipedia.org/wiki/UTF-8#Examples
In 2015, UTF-8 is the most common web character encoding.
- HTML charset meta attribute:
  
  <meta charset="UTF-8">
- XML Header:
  
  <?xml version="1.0" encoding="UTF-8"?>
- HTTP Header:
  
  content-type: text/html; charset=UTF-8
Why use UTF-8? https://www.w3.org/International/questions/qa-choosing-encodings#useunicode

Logic, Reasoning, and Inference¶

https://en.wikipedia.org/wiki/Epistemology

Logic
Reasoning
- Inference
  - Entailment

Logic ¶

Wikipedia: https://en.wikipedia.org/wiki/Logic

WikipediaCategory: https://en.wikipedia.org/wiki/Category:Logic

See:

Inference

{ True, False, Unknown }

{ T, F, NULL }  # SQL
{ T, F, None }  # Python
{ T, F, nil }   # Ruby
{ 1, 0, -1 }    #

Fuzzy Logic ¶

Wikipedia: https://en.wikipedia.org/wiki/Fuzzy_logic

Probabilistic Logic ¶

Wikipedia: https://en.wikipedia.org/wiki/Probabilistic_logic

Propositional Calculus ¶

Wikipedia: https://en.wikipedia.org/wiki/Propositional_calculus
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Propositional_calculus
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Theorems_in_propositional_logic

Premise P
Conclusion Q

Modus ponens ¶

Wikipedia: https://en.wikipedia.org/wiki/Modus_ponens

P -> Q – Premise 1 P1 P_1 (“P sub 1”)

P – Premise 2 P2 P_2 (“P sub 2”)

∴ Q – Conclusion Q Q_0 (“Q sub 0”)

Predicate Logic ¶

Wikipedia: https://en.wikipedia.org/wiki/Predicate_logic

Universe of discourse
Predicate
- ∃ – There exists – Existential quantifier
- ∀ – For all – Universal quantifier

Existential quantification ¶

Wikipedia: https://en.wikipedia.org/wiki/Existential_quantification

∃ – “There exists” is the Existential quantifier symbol.
An existential quantifier is true (“holds true”) if there is one (or more) example in which the condition holds true.
An existential quantifier is satisfied by one (or more) examples.

Universal quantification ¶

Wikipedia: https://en.wikipedia.org/wiki/Existential_quantification

∀ – “For all” is the Universal quantifier symbol.
A universal quantification is disproven by one counterexample where the condition does not hold true.
- disproven by one counterexample.

Hoare Logic ¶

Wikipedia: https://en.wikipedia.org/wiki/Hoare_logic

precondition P

command C

postcondition Q

See:

Propositional Calculus, Predicate Logic
Given-When-Then

First-order Logic ¶

Wikipedia: https://en.wikipedia.org/wiki/First-order_logic

First-order logic (FOL)

Terms
- Variables
  - x, y, z
  - x, x_0 (“x subscript 0”, “x sub 0”)
- Functions
  - f(x) – function symbol (arity 1)
  - a – constant symbol (arity 0) ( a() )

Formulas (“formulae”)
- Equality
  - = – equality
- Logical Connectives (“unary”, “binary”, sequence/tuple/list)
  - ¬ – ~, ! – negation (unary)
  - ...
  - ∧ – ^, &&, and – conjunction
  - ∨ – v, ||, or – disjunction
  - → – ->, ⊃ – implication
  - ↔ – <->, ≡ – biconditional
  - ...
  - XOR
  - NAND
- Grouping Operators
  - Parentheses ( )
  - Brackets < >
- Relations
  - P(x) – predicate symbol (n_args=1, arity 1, valence 1)
  - R(x) – relation symbol (n_args=1, arity 1, valence 1)
  - Q(x,y) – binary predicate/relation symbol (n_args=2, ...)
- Quantifier Symbols “universe relation”
  - ∃
  - ∀
- ... https://en.wikipedia.org/wiki/First-order_logic

Description Logic ¶

Wikipedia: https://en.wikipedia.org/wiki/Description_logic

Description Logic (DL; DLP (Description Logic Programming))

Knowledge Base = TBox + ABox

https://en.wikipedia.org/wiki/TBox (Schema: Class/Property Ontology)
https://en.wikipedia.org/wiki/ABox (Facts / Instances)

See:

OWL, Entailment
Semantic Web
N3 for => implies

Reasoning ¶

https://en.wikipedia.org/wiki/Deductive_reasoning

https://en.wikipedia.org/wiki/Category:Reasoning

https://en.wikipedia.org/wiki/Semantic_reasoner

See: Description Logic

Inference ¶

Inference: https://en.wikipedia.org/wiki/Inference

Entailment ¶

Wikipedia: https://en.wikipedia.org/wiki/Entailment

http://www.w3.org/TR/owl2-profiles/#Introduction

See: Data Science

Data Engineering¶

Data Engineering is about the 5 Ws (who, what, when, where, why) and how data are stored.

Who:   schema:author         @westurner ;
What:  schema:name           “WRD R&D Documentation”@en ;
When:  schema:codeRepository <https://github.com/wrdrd/docs/commits/master> ;
Where: schema:codeRepository <https://github.com/wrdrd/docs> ;
Why:   schema:description    “Documentation purposes”@en ;
How:   schema:programmingLanguage :ReStructuredText ;
How:   schema:runtimePlatform [ :Python, :CPython, :Sphinx ] ;

File Structures
- Git File Structures
- Torrent file structure
File Locking
Data Structures
- Arrays
- Matrices
- Lists
- Graphs
  - NetworkX
  - DFS
  - BFS
  - Topological Sorting
- Trees
Compression Algorithms
- bzip2
- gzip
- tar
- zip
Hash Functions
- CRC
- MD5
- SHA
Filesystems
- RAID
- MBR
- GPT
- LVM
- btrfs
- ext
- FAT
- ISO9660
- HFS+
- NTFS
- FUSE
  - SSHFS
Network Filesystems
- Ceph
- CIFS
- DDFS
- GlusterFS
- HDFS
- NFS
- S3
- Swift
- SMB
- WebDAV
Databases
- Object Relational Mapping
- Relation Algebra
- Relational Algebra
- Relational Databases
  - SQL
  - Drizzle
  - MySQL
  - PostgreSQL
  - SQLite
  - Virtuoso
- NoSQL Databases
- Graph Databases
  - Blazegraph
  - Blueprints
  - Gremlin
  - Neo4j
- RDF Triplestores
- Distributed Databases
  - Accumulo
  - BigTable
  - Apache Beam
  - Cassandra
  - Hadoop
  - HBase
  - Hive
  - Parquet
  - Presto
  - Spark
    - GraphX
Distributed Algorithms
Distributed Computing Protocols
- CORBA
- Message Passing
- ESB
- MPI
- XML-RPC
- JSON-RPC
- Avro
- Protocol Buffers
- Thrift
- SOA
  - WS-*
  - WSDL
- JSON-WSP
- ROA
  - REST
- WAMP
Data Grid
Search Engine Indexing
- ElasticSearch
- Haystack
- Lucene
- Nutch
- Solr
- Whoosh
- Xapian
- Information Retrieval

File Structures ¶

https://en.wikipedia.org/wiki/File_format

https://en.wikipedia.org/wiki/Record_(computer_science)

https://en.wikipedia.org/wiki/Field_(computer_science)

https://en.wikipedia.org/wiki/Index#Computer_science

tar and zip are file structures that have a manifest and a payload
- Filesystems often have redundant manifests (and/or deduplication according to a hash table manifest with an interface like a DHT)
Web Standards and Semantic Web Standards which define file structures (and stream protocols):
- XML
- RDF (RDF/XML, Turtle, N3, RDFa, JSON-LD)
- JSON (JSON-LD)
- HTTP

Git File Structures ¶

Git specifies a number of file structures: Git Objects, Git References, and Git Packfiles.

Git implements something like on-disk shared snapshot objects with commits, branching, merging, and multi-protocol push/pull semantics: https://en.wikipedia.org/wiki/Shared_snapshot_objects

Git Object ¶

Docs: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

Git Reference ¶

Docs: https://git-scm.com/book/en/v2/Git-Internals-Git-References

Git Packfile ¶

Docs: https://git-scm.com/book/en/v2/Git-Internals-Packfiles

“Git is a content-addressable filesystem“

bup ¶

Homepage: https://bup.github.io/
Source: git https://github.com/bup/bup
Docs: https://github.com/bup/bup/blob/master/README.md
Docs: https://bup.github.io/man.html
Docs: https://github.com/bup/bup/blob/master/DESIGN

Bup (backup) is a backup system based on git packfiles and rolling checksums.

[bup is a very] efficient backup system based on the Git Packfile format, providing fast incremental saves and global deduplication (among and within files, including virtual machine images).

Torrent file structure ¶

A bittorrent torrent file is an encoded manifest of tracker, DHT, and web seed URIs; and segment checksum hashes.

Like MPEG-DASH and HTTP Live Streaming, BitTorrent downloads file segments over HTTP.

See: BitTorrent, Named Data Networking, Web Distribution

File Locking ¶

Wikipedia: https://en.wikipedia.org/wiki/File_locking

File locking is one strategy for synchronization with concurrency and parallelism.

An auxilliary <filename>.lock file is still susceptible to race conditions
C file locking functions: fcntl, lockf, flock
Python file locking functions: fcntl.fcntl, fcntl.lockf, fcntl.flock: https://docs.python.org/2/library/fcntl.html
To lock a file for all processes with Linux requires a mandatory file locking mount option (mount -o mand`) and per-file setgid and noexec bits (chmod g+s,g-s).
To lock a file (or a range / record of a file) for all processes with Windows requires no additional work beyond win32con.LOCKFILE_EXCLUSIVE_LOCK, win32file.LockFileEx, and win32file.UnlockFileEx.
CWE-667: Improper Locking: https://cwe.mitre.org/data/definitions/667.html#Relationships

Data Structures ¶

Wikipedia: https://en.wikipedia.org/wiki/Data_structure
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Data_structures
Docs: https://en.wikipedia.org/wiki/List_of_data_structures

http://rosettacode.org/wiki/Category:Programming_Tasks
- http://rosettacode.org/wiki/Greatest_common_divisor
- http://rosettacode.org/wiki/Go_Fish

Arrays ¶

Wikipedia: https://en.wikipedia.org/wiki/Array_data_structure

Docs: https://en.wikipedia.org/wiki/List_of_data_structures#Arrays

An array is a data structure for unidimensional data.

Arrays must be resized when data grows beyond the initial shape of the array.
Sparse arrays are sparsely allocated.
A multidimensional array is said to be a matrix.

Matrices ¶

Wikipedia: https://en.wikipedia.org/wiki/Matrix_(computer_science)

A matrix is a data structure for multidimensional data; a multidimensional array.

Lists ¶

Wikipedia: https://en.wikipedia.org/wiki/Linked_list

Docs: https://en.wikipedia.org/wiki/List_of_data_structures#Lists

A list is a data structure with nodes that link to a next and/or previous node.

Graphs ¶

Wikipedia: https://en.wikipedia.org/wiki/Graph_(abstract_data_type)
Wikipedia: https://en.wikipedia.org/wiki/Graph_(mathematics)
Wikipedia: https://en.wikipedia.org/wiki/Graph_theory
Docs: https://en.wikipedia.org/wiki/Conceptual_graph
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Graphs
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Graph_data_structures
WikipediaCategory: https://en.wikipedia.org/wiki/Category:Graph_theory

A graph is a system of nodes connected by edges; an abstract data type for which there are a number of suitable data structures.

A node has edges.
An edge connects nodes.
Edges of directed graphs flow in only one direction; and so require two edges with separate attributes (e.g. ‘magnitude’, ‘scale’

Wikipedia: https://en.wikipedia.org/wiki/Directed_graph
Edges of an undirected graph connect nodes in both directions (with the same attributes).

Wikipedia: https://en.wikipedia.org/wiki/Graph_(mathematics)#Undirected_graph
Graphs and Trees are traversed (or walked); according to a given algorithm (e.g. DFS, BFS).
Graph nodes can be listed in many different orders (or with a given ordering):
- Preoder
- Inorder
- Postorder
- Level-order
There are many data structure representatations for Graphs.
There are many data serialization/marshalling formats for graphs:
- Graph edge lists can be stored as adjacency matrices.
- NetworkX supports a number of graph storage formats.
- RDF is a standard semantic web Linked Data format for Graphs.
- JSON-LD is a standard semantic web Linked Data format for Graphs.
There are many Graph Databases and RDF Triplestores for storing graphs.
A cartesian product has an interesting graph representation. (See Compression Algorithms)

NetworkX ¶

Wikipedia: https://en.wikipedia.org/wiki/NetworkX
Homepage: https://networkx.github.io/
Source: git https://github.com/networkx/networkx
Docs: https://networkx.readthedocs.io/en/latest/
Docs: https://networkx.readthedocs.io/en/latest/tutorial/
Docs: https://networkx.readthedocs.io/en/latest/reference/classes.html
Docs: https://networkx.readthedocs.io/en/latest/reference/algorithms.html

NetworkX is an Open Source graph algorithms library written in Python.

DFS ¶

Wikipedia: https://en.wikipedia.org/wiki/Depth-first_search

DFS (Depth-first search) is a graph traversal algorithm.

# Given a tree:
1
  1.1
  1.2
2
  2.1
  2.2

# BFS:
[1, 1.1, 1.2, 2, 2.1, 2.2

See also: Bulk Synchronous Parallel, Firefly Algorithm

Topological Sorting ¶

Wikipedia: https://en.wikipedia.org/wiki/Topological_sorting

A DAG (directed acyclic graph) has a topological sorting, or is topologically sorted.

The unix tsort utility does a topological sorting of a space and newline delimited list of edge labels:

$ tsort --help
Usage: tsort [OPTION] [FILE]
Write totally ordered list consistent with the partial ordering in FILE.
With no FILE, or when FILE is -, read standard input.

    --help     display this help and exit
    --version  output version information and exit

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'tsort invocation'

$ echo -e '1 2\n2 3\n3 4\n2 a' | tsort
1
2
a
3
4

Installing a set of packages with dependencies is a topological sorting problem; plus e.g. version and platform constraints (as solvable with a SAT constraint satisfaction solver (see conda (pypi:pycosat)))
A topological sorting can identify the “root” of a directed acyclic graph.
- Information gain can be useful for less discrete problems.

Trees ¶

Wikipedia: https://en.wikipedia.org/wiki/Tree_data_structure

Docs: http://rosettacode.org/wiki/Tree_traversal

A tree is a directed graph.

A tree is said to have branches and leaves; or just nodes.

There are many types of and applications for trees:

Compression Algorithms ¶

bzip2 ¶

Wikipedia: https://en.wikipedia.org/wiki/Bzip2
File Extension: .bz2
Homepage: http://bzip.org/

bzip2 is an Open Source lossless compression algorithm based upon the Burrows-Wheeler algorithm.

bzip2 is usually slower than gzip or zip, but more space efficient

gzip ¶

Wikipedia: https://en.wikipedia.org/wiki/Gzip
Homepage: https://www.gnu.org/software/gzip/
File Extension: .gz
Source: http://ftp.gnu.org/gnu/gzip/
Docs: https://www.gnu.org/software/gzip/manual/
Docs: https://www.gnu.org/software/gzip/manual/gzip.html

gzip is a compression algorithm based on DEFLATE and LZ77.

gzip is similar to zip, in that both are based upon DEFLATE

tar ¶

Wikipedia: https://en.wikipedia.org/wiki/Tar_(computing)

File Extension: .tar

tar is a file archiving format for storing a manifest of records of a set of files with paths and attributes at the beginning of the actual files all concatenated into one file.

TAR = ( table of contents + data stream )
.tar.gz is tar + gzip
.tar.bz2 is tar + bzip2

TAR and gzip or bzip2 can be streamed over SSH:

# https://unix.stackexchange.com/a/95994
tar czf - . | ssh remote "( cd ~/ ; cat > file.tar.gz )"
tar bzf - . | ssh remote "( cd ~/ ; cat > file.tar.bz2 )"

zip ¶

Wikipedia: https://en.wikipedia.org/wiki/Zip_(file_format)

zip is a lossless file archive compression

Hash Functions ¶

Wikipedia: https://en.wikipedia.org/wiki/Hash_function

Wikipedia: https://en.wikipedia.org/wiki/Cryptographic_hash_function

Hash functions (or checksums) are one-way functions designed to produce uniquely identifying identifiers for blocks or whole files in order to verify data Integrity.

A hash is the output of a hash function.
In Python, dict keys must be hashable (must have a __hash__ method).
In Java, Scala, and many other languages dicts are called HashMaps.
MD5 is a checksum algorithm.
SHA is a group of checksum algorithms.

CRC ¶

Wikipedia: https://en.wikipedia.org/wiki/Cyclic_redundancy_check

A CRC (Cyclical Redundancy Check) is a hash function for error detection based upon an extra check value.

Hard Drives and SSDs implement CRCs.
Ethernet implements CRCs.

MD5 ¶

Wikipedia: https://en.wikipedia.org/wiki/MD5

MD5 is a 128-bit hash function which is now broken, and deprecated in favor of SHA-2 or better.

md5
md5sums

SHA ¶

Wikipedia: https://en.wikipedia.org/wiki/Secure_Hash_Algorithm

SHA-0 – 160 bit (retracted 1993)
SHA-1 — 160 bit (deprecated 2010)
SHA-2 — sha-256, sha-512
SHA-3 (2012)

shasum
shasum -a 1
shasum -a 224
shasum -a 256
shasum -a 384
shasum -a 512
shasum -a 512224
shasum -a 512256

Filesystems ¶

Wikipedia: https://en.wikipedia.org/wiki/File_system

Filesystems (file systems) determine how files are represented in a persistent physical medium.

On-disk filesystems determine where and how redundantly data is stored
On-disk filesystems: ext, btrfs, FAT, NTFS, HFS+
Network Filesystems link disk storage pools with other resources (e.g. NFS, Ceph, GlusterFS)

RAID ¶

Wikipedia: https://en.wikipedia.org/wiki/RAID

RAID (redundant array of independent disks) is set of configurations for Hard Drives and SSDs to stripe and/or mirror with parity.

RAID 0 -- striping,        -,             no parity ... throughput
RAID 1 -- no striping,  mirroring,        no parity ...
RAID 2 -- bit striping,    -,             no parity ... legacy
RAID 3 -- byte striping,   -,      dedicated parity ... uncommon
RAID 4 -- block striping,  -,      dedicated parity
RAID 5 -- block striping,  -,    distributed parity ... min. 3; n-1 rebuild
RAID 6 -- block striping,  -, 2x distributed parity

RAID Implementations:

RAID may be implemented by a physical controller with multiple drive connectors.
RAID may be implemented as a BIOS setting.
RAID may be implemented with software e.g. LVM, btrfs.
https://en.wikipedia.org/wiki/RAID#Software-based
https://en.wikipedia.org/wiki/RAID#Firmware-_and_driver-based (“fake RAID”)
Data Scrubbing

Data scrubbing is a technique for checking for inconsistencies between redundant copies of data

Data scrubbing is routinely part of RAID (with mirrors and/or parity bits).

https://en.wikipedia.org/wiki/Data_scrubbing

MBR ¶

Wikipedia: https://en.wikipedia.org/wiki/Master_boot_record

MBR (Master Boot Record) is a boot record format and a file partition scheme.

DOS and Windows use MBR partition tables.
Many/most UNIX variants support MBR partition tables.
Linux supports MBR partition tables.
Most PCs since 1983 boot from MBR partition tables.
When a PC boots, it reads the MBR on the first configured drive in order to determine where to find the bootloader.

GPT ¶

Wikipedia: https://en.wikipedia.org/wiki/GUID_Partition_Table

GPT (GUID Partition Table) is a boot record format and a file partition scheme wherein partitions are assigned GUIDs (Globally Unique Identifiers).

OSX uses GPT partition tables.
Linux supports GPT partition tables.
https://en.wikipedia.org/wiki/GUID_Partition_Table#UNIX_and_Unix-like_operating_systems

LVM ¶

Wikipedia: https://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)
Homepage: https://www.sourceware.org/lvm2/
Source: ftp://sources.redhat.com/pub/lvm2/
Docs: https://www.sourceware.org/dm/
Docs: http://www.tldp.org/HOWTO/LVM-HOWTO/index.html
Docs: http://www.tldp.org/HOWTO/LVM-HOWTO/anatomy.html

LVM (Logical Volume Manager) is an Open Source software disk abstraction layer with snapshotting, copy-on-write, online resize and allocation and a number of additional features.

In LVM, there are Volume Groups (VG), Physical Volumes (PV), and Logical Volumes (LV).
LVM can do striping and high-availability sofware RAID.
LVM and device-mapper are now part of the Linux kernel tree (the LVM linux kernel modules are built and included with most distributions’ default kernel build).
LVM Logical Volumes can be resized online (without e.g. rebooting to busybox or a LiveCD); but many Filesystems support only onlize grow (and not online shrink).
There is feature overlap between LVM and btrfs (pooling, snapshotting, copy-on-write).

btrfs ¶

Wikipedia: https://en.wikipedia.org/wiki/Btrfs
Homepage: https://btrfs.wiki.kernel.org/index.php/Main_Page
Source: https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories
Source: git git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
Docs: https://btrfs.wiki.kernel.org/index.php/Getting_started#Basic_Filesystem_Commands
Docs: https://btrfs.wiki.kernel.org/index.php/Problem_FAQ
Docs: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-btrfs.html
Docs: https://wiki.archlinux.org/index.php/Btrfs
Docs: https://help.ubuntu.com/community/btrfs

btrfs (B-tree filesystem) is an Open Source pooling, snapshotting, checksumming, deduplicating, union mounting copy-on-write on-disk Linux filesystem.

ext ¶

Wikipedia: https://en.wikipedia.org/wiki/Ext2
Wikipedia: https://en.wikipedia.org/wiki/Ext3
Wikipedia: https://en.wikipedia.org/wiki/Ext4

ext2, ext3, and ext4 are the ext (extended filesystem) Open Source on-disk filesystems.

ext filesystems are the default filesystems of many Linux distributions.
windows machines can access ext2, ext3, and ext4 filesystems with ext2explore and ext2fsd.
OSX machines can access ext2, ext3, and ext4 filesystems with OSXFuse and FUSE-EXT2.

FAT ¶

Wikipedia: https://en.wikipedia.org/wiki/File_Allocation_Table

FAT is a group of on-disk filesystem standards.

FAT is used on cross-platform USB drives.
FAT is found on older Windows and DOS machines.
FAT12, FAT16, and FAT32 are all FAT filesystem standards.
FAT32 has a maximum filesize of 4GB and a maximum volume size of 2 TB.
Windows machines can read and write FAT partitions.
OSX machines can read and write FAT partitions.
Linux machines can read and write FAT partitions.

ISO9660 ¶

Wikipedia: https://en.wikipedia.org/wiki/ISO_9660

FileExt: .iso

ISO9660 is an ISO standard for disc drive images which specifies a standard for booting from a filesystem image.

Many Operating System distributions are distributed as ISO9660 .iso files.

ISO9660 and Linux:

An ISO9660 ISO can be loop mounted:

mount -o loop,ro -t iso9660 ./path/to/file.iso /mnt/cdrom

An ISO8660 CD can be mounted:

mount -o ro -t iso9660 /dev/cdrom /mnt/cdrom

Most CD/DVD burning utilities support ISO9660 .iso files.
ISO9660 is useful in that it specifies how to encode the boot sector (El Torito) and partition layout.
Nowadays, ISO9660 .iso files are often converted to raw drive images and written to bootable USB Mass Storage devices (e.g. to write a install / recovery disq for Debian, Ubuntu, Fedora, Windows)

HFS+¶

Wikipedia: https://en.wikipedia.org/wiki/HFS_Plus

HFS+ (Hierarchical Filesystem) or Mac OS Extended, is the filesystem for Mac OS 8.1+ and OSX.

HFS+ is required for OSX and Time Machine.

http://www.cnet.com/how-to/the-best-ways-to-format-an-external-drive-for-windows-and-mac/
Windows machines can access HFS+ partitions with: HFSExplorer (free, Java), Paragon HFS+ for Windows, or MacDrive

http://www.makeuseof.com/tag/4-ways-read-mac-formatted-drive-windows/
Linux machines can access HFS+ partitions with hfsprogs (apt-get install hfsprogs, yum install hfsprogs).

NTFS ¶

Wikipedia: https://en.wikipedia.org/wiki/NTFS

NTFS is a proprietary journaling filesytem.

Windows machines since Windows NT 3.1 and Windows XP default to NTFS filesystems.
Non-Windows machines can access NTFS partitions through NTFS-3G: https://en.wikipedia.org/wiki/NTFS-3G

FUSE ¶

Wikipedia: https://en.wikipedia.org/wiki/Filesystem_in_Userspace
Homepage: http://fuse.sourceforge.net/
Download: http://sourceforge.net/projects/fuse/files/fuse-2.X/
Source: git http://git.code.sf.net/p/fuse/fuse
Docs: http://fuse.sourceforge.net/doxygen/index.html
Docs: http://sourceforge.net/p/fuse/wiki/FileSystems/
Docs: http://sourceforge.net/p/fuse/wiki/LanguageBindings/
Docs: http://sourceforge.net/p/fuse/wiki/OperatingSystems/

FUSE (Filesystem in Userspace) is a userspace filesystem API for implementing filesystems in userspace.

FUSE support is included in the Linux kernel since 2.6.14.
FUSE is available for most POSIX platforms.

Interesting FUSE implementations:

PyFilesystem is a Python language api interface which supports FUSE: http://docs.pyfilesystem.org/en/latest/
There are FUSE bindings for Hadoop HDFS.
Ceph can be mounted with/over/through FUSE.
GlusterFS can be mounted with/over/through FUSE.
NTFS-3G mounts volumes with FUSE.
virtualbox-fuse supports mounting of virtualbox VDI images with FUSE.
SSHFS, GitFS, GmailFS, GdriveFS, WikipediaFS and Gnome GVFS are all FUSE filesystems.

SSHFS ¶

Wikipedia: https://en.wikipedia.org/wiki/SSHFS
Homepage: http://fuse.sourceforge.net/sshfs.html
Download: http://sourceforge.net/projects/fuse/files/sshfs-fuse/
Source: git http://git.code.sf.net/p/fuse/sshfs
Docs: https://wiki.archlinux.org/index.php/Sshfs
Docs: https://help.ubuntu.com/community/SSHFS
Docs: https://github.com/osxfuse/osxfuse/wiki/SSHFS

SSHFS is a FUSE filesystem for mounting remote directories over SSH.

Network Filesystems ¶

Wikipedia: https://en.wikipedia.org/wiki/Network_filesystem

Ceph ¶

Wikipedia: https://en.wikipedia.org/wiki/Ceph_(software)
Homepage: http://ceph.com/
Download: http://ceph.com/resources/downloads/
Source: git https://github.com/ceph/ceph
Docs: http://ceph.com/docs/master/
Docs: http://ceph.com/docs/master/rados/
Docs: http://ceph.com/docs/master/radosgw/
Docs: http://ceph.com/docs/master/radosgw/s3/
Docs: http://ceph.com/docs/master/radosgw/swift/
Docs: http://ceph.com/docs/master/radosgw/keystone/
Docs: http://ceph.com/docs/master/rbd/rbd-openstack/

Ceph is an Open Source network filesystem (a distributed database for files with attributes like owner, group, permissions) written in C++ and Perl which runs over top of one or more on-disk filesystems.

Ceph Block Device (rbd) – striping, caching, snapshots, copy-on-write, kvm, libvirt, OpenStack Cinder block storage
Ceph Filesystem (cephfs) – POSIX filesystem with FUSE, NFS, CIFS, and HDFS APIs
Ceph Object Gateway (radosgw) – RESTful API, Amazon AWS S3 API, OpenStack Swift API, OpenStack Keystone authentication

CIFS ¶

CIFS (Common Internet File System) is a centralized network filesystem protocol.

Samba smbd is one implementation of a CIFS network file server.

DDFS ¶

DDFS (Disco Distributed File System) is a distributed network filesystem written in Python and C.

DDFS is like a python implementation of HDFS (which is written in Java).

GlusterFS ¶

Wikipedia: https://en.wikipedia.org/wiki/GlusterFS
Homepage: http://www.gluster.org/
Project: https://forge.gluster.org/glusterfs-core
Source: git https://git.forge.gluster.org/glusterfs-core/glusterfs.git
Docs: https://gluster.readthedocs.io/en/latest/
Docs: https://gluster.readthedocs.io/en/latest/Quick-Start-Guide/Quickstart/
Docs: https://gluster.readthedocs.io/en/latest/Install-Guide/Setup_virt/
Docs: https://gluster.readthedocs.io/en/latest/Install-Guide/Setup_Bare_metal/
Docs: https://gluster.readthedocs.io/en/latest/Install-Guide/Setup_aws/
Docs: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/GlusterFS%20Cinder/
Tcp ports: 111, 24007, 24008, 24009, 24010, 24011, 38465:38469

GlusterFS is an Open Source network filesystem (a distributed database for files with attributes like owner, group, permissions) which runs over top of one or more on-disk filesystems.

GlusterFS can serve volumes for OpenStack Cinder block storage

HDFS ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_Hadoop#HDFS

HDFS (Hadoop Distributed File System) is an Open Source distributed network filesystem.

HDFS runs code next to data; rather than streaming data through code across the network.
HDFS is especially suitable for MapReduce-style distributed computation.
Apache Hadoop works with files stored over HDFS, FTP, S3, WASB (Azure)
There are HDFS language apis for many languages: Java, Scala, Go, Python, Ruby, Perl, Haskell, C++
Mesos can manage distributed HDFS grids.
ElasticSearch
It’s possible to configure a Jenkins Continuous Integration cluster as Hadoop cluster.
Many databases support storage over HDFS (HBase, Cassandra, Accumulo, Spark)
Ceph can now serve files over HDFS.
HDFS can be mounted as a FUSE filesystem (e.g. with Linux).
HDFS can be accessed from the commandline with the Hadoop FS shell: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
HDFS can be browsed with hdfs-du: https://github.com/twitter/hdfs-du

NFS ¶

Wikipedia: https://en.wikipedia.org/wiki/NFS

NFS (Network File System #TODO) is an Open Source centralized network filesystem.

S3 ¶

Amazon AWS S3
OpenStack Swift
Ceph
GlusterFS

Swift ¶

OpenStack Swift
Ceph
GlusterFS

SMB ¶

Wikipedia: https://en.wikipedia.org/wiki/Server_Message_Block

SMB (Server Message Block) is a centralized network filesystem.

SMB has been superseded by CIFS.

WebDAV ¶

Wikipedia: https://en.wikipedia.org/wiki/WebDAV
Standard: https://tools.ietf.org/html/rfc2518
Standard: https://tools.ietf.org/html/rfc4918

WebDAV (Web Distributed Authoring and Versioning) is a network filesystem protocol built with HTTP.

WebDAV specifies a number of unique HTTP methods:
- PROPFIND (ls, stat, getfacl),
- PROPPATCH (touch, setfacl)
- MKCOL (mkdir)
- COPY (cp)
- MOVE (mv)
- LOCK (File Locking)
- UNLOCK ()

Databases ¶

Wikipedia: https://en.wikipedia.org/wiki/Database

Object Relational Mapping ¶

Wikipedia: https://en.wikipedia.org/wiki/Object-relational_mapping

https://en.wikipedia.org/wiki/Object-relational_impedance_mismatch

https://en.wikipedia.org/wiki/List_of_object-relational_mapping_software

Relation Algebra ¶

Wikipedia: https://en.wikipedia.org/wiki/Relation_algebra

https://en.wikipedia.org/wiki/Relation_algebra#Expressing_properties_of_binary_relations_in_RA

See: Relational Algebra

Relational Algebra ¶

Wikipedia: https://en.wikipedia.org/wiki/Relational_algebra

See: Relation Algebra, Relational Databases

Relational Databases ¶

Wikipedia: https://en.wikipedia.org/wiki/Relational_database

https://en.wikipedia.org/wiki/Relational_model

Relational Algebra

https://en.wikipedia.org/wiki/Database_normalization

https://en.wikipedia.org/wiki/Relational_database_management_system

What doesn’t SQL do?

SQL ¶

Wikipedia: https://en.wikipedia.org/wiki/SQL

https://en.wikipedia.org/wiki/Null_(SQL)#Comparisons_with_NULL_and_the_three-valued_logic_.283VL.29
https://en.wikipedia.org/wiki/Join_(SQL)
https://en.wikipedia.org/wiki/SQL_injection
http://cwe.mitre.org/top25/#CWE-89 (#1 Most Prevalent Dangerous Security Error (2011))

See: Object Relational Modeling

Drizzle ¶

Wikipedia: https://en.wikipedia.org/wiki/Drizzle_(database_server)
Homepage: http://www.drizzle.org/
Project: https://launchpad.net/drizzle
Download: http://www.drizzle.org/content/download
Source: bzr lp:drizzle
Docs: http://www.drizzle.org/content/documentation
Docs: http://docs.drizzle.org/

Drizzle is an Open Source relational database “for the cloud” which was forked from MySQL 6.0.

Drizzle stores all data as UTF-8.
Drizzle has a minimal core and a plugin API.

MySQL ¶

Wikipedia: https://en.wikipedia.org/wiki/MySQL
Homepage: https://www.mysql.com/
Download: https://dev.mysql.com/downloads/mysql/
Source: git https://github.com/mysql/mysql-server
Doc: https://dev.mysql.com/doc/

MySQL Community Edition is an Open Source relational database.

PostgreSQL ¶

Wikipedia: https://en.wikipedia.org/wiki/PostgreSQL
Homepage: http://www.postgresql.org/
Download: http://www.postgresql.org/download/
Source: git http://git.postgresql.org/git/postgresql.git
Docs: http://www.postgresql.org/docs/
Docs: http://www.postgresql.org/docs/9.4/static/index.html
Docs: http://www.postgresql.org/docs/9.4/static/sql.html

PostgreSQL is an Open Source relational database.

PostgreSQL has native support for storing and querying JSON.
PostgreSQL has support for geographical queries (PostGIS).

SQLite ¶

Wikipedia: https://en.wikipedia.org/wiki/SQLite
Homepage: https://www.sqlite.org/
Download: https://www.sqlite.org/download.html
Source:
Docs: https://www.sqlite.org/docs.html
Docs: https://www.sqlite.org/different.html
Docs: https://www.sqlite.org/threadsafe.html
Docs: https://www.sqlite.org/uri.html
FileExt: .sqlite

SQLite is a serverless Open Source relational database which stores all data in one file.

SQLite is included in the Python standard library.

Virtuoso ¶

Wikipedia: https://en.wikipedia.org/wiki/Virtuoso_Universal_Server
Homepage: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/
Source: git https://github.com/openlink/virtuoso-opensource
Docs: http://docs.openlinksw.com/virtuoso/
Docs: http://docs.openlinksw.com/virtuoso/sqlreference.html
Docs: http://docs.openlinksw.com/virtuoso/rdfandsparql.html
Docs: http://docs.openlinksw.com/virtuoso/rdfsparql.html
Docs: http://docs.openlinksw.com/virtuoso/rdfsparqlrule.html
Docs: http://docs.openlinksw.com/virtuoso/rdfgraphsecurity.html
Docs: http://docs.openlinksw.com/virtuoso/virtuososponger.html

Virtuoso Open Source edition is a multi-paradigm relational database / XML document database / RDF triplestore.

Relational Tables Data Management (Columnar or Column-Store SQL RDBMS)

Relational Property Graphs Data Management (SPARQL RDF based Quad Store)

Content Management (HTML, TEXT, Turtle, RDF/XML, JSON, JSON-LD, XML)

Web and other Document File Services (Web Document or File Server)

Five-Star Linked Open Data Deployment (RDF-based Linked Data Server)

Web Application Server (SOAP or RESTful interaction modes).

Virtuoso supports ODBC, JDBC, and DB-API relational database access.
Virtuoso powers DBpedia.

NoSQL Databases ¶

Wikipedia: https://en.wikipedia.org/wiki/NoSQL

https://en.wikipedia.org/wiki/Keyspace_(distributed_data_store)

https://en.wikipedia.org/wiki/Column_(data_store)

Graph Databases ¶

Wikipedia: https://en.wikipedia.org/wiki/Graph_database

https://en.wikipedia.org/wiki/Graph_database#Graph_database_projects

Graph Queries

Blazegraph ¶

Homepage: http://www.blazegraph.com/
Download: http://www.blazegraph.com/download
Src: git git://git.code.sf.net/p/bigdata/git
Docs: http://www.blazegraph.com/learn
Docs: http://www.blazegraph.com/inference
Docs: http://www.blazegraph.com/blueprints
Docs: http://www.blazegraph.com/sesame
Docs: http://www.blazegraph.com/develop
Docs: http://www.blazegraph.com/docs/api/
Docs: https://wiki.blazegraph.com/wiki/index.php/Main_Page

Blazegraph is an Open Source graph database written in Java with support for Gremlin, Blueprints, RDF, RDFS and OWL inferencing, SPARQL.

Blazegraph was formerly known as Bigdata.
Blazegraph 1.5.2 supports Solr (e.g. TF-IDF) indexing.
Blazegraph will power the Wikidata Query Service (RDF, SPARQL):

https://lists.wikimedia.org/pipermail/wikidata-tech/2015-March/000740.html
MapGraph is a set of GPU-accelerations for graph processing.

Blueprints ¶

Wikipedia:
Homepage:
Src: git https://github.com/tinkerpop/blueprints
Docs: https://github.com/tinkerpop/blueprints/wiki

Blueprints is an Open Source graph database API (and reference graph data model).

Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data model.

Blueprints is analogous to the JDBC, but for graph databases. As such, it provides a common set of interfaces to allow developers to plug-and-play their graph database backend.

Moreover, software written atop Blueprints works over all Blueprints-enabled graph databases.

Within the TinkerPop software stack, Blueprints serves as the foundational technology for:

Pipes: A lazy, data flow framework

Gremlin: A graph traversal language

Frames: An object-to-graph mapper

Furnace: A graph algorithms package

Rexster: A graph server

There are many blueprints API implementations (e.g. Rexster, Neo4j, Blazegraph, Accumulo)

Gremlin ¶

Wikipedia: https://en.wikipedia.org/wiki/Gremlin_(programming_language)
Src: git https://github.com/tinkerpop/gremlin
Docs: https://github.com/tinkerpop/gremlin/wiki

Gremlin is an Open Source domain-specific language for traversing property graphs.

Gremlin works with databases that implement the Blueprints graph database API.

Neo4j ¶

Wikipedia: https://en.wikipedia.org/wiki/Neo4j
Homepage: http://neo4j.com/
Download: http://neo4j.com/download/
Src: git https://github.com/neo4j/neo4j
Docs: http://neo4j.com/developer/get-started/
Docs: http://neo4j.com/docs/
Docs: http://neo4j.com/docs/2.2.3/
Docs: http://neo4j.com/developer/cypher/
Docs: http://neo4j.com/docs/stable/cypher-refcard/
Docs: https://en.wikipedia.org/wiki/Cypher_Query_Language
Docs: http://neo4j.com/open-source-project/

Neo4j is an Open Source HA graph database written in Java.

Neo4j implements the Paxos distributed algorithm for HA (high availability).
Neo4j can integrate with Spark and ElasticSearch.
Neo4j is widely deployed in production environments.
There is a Blueprints API implementation for Neo4j:

https://github.com/tinkerpop/blueprints/wiki/Neo4j-Implementation

RDF Triplestores ¶

Wikipedia: https://en.wikipedia.org/wiki/Triplestore

https://en.wikipedia.org/wiki/List_of_subject-predicate-object_databases

Graph Pattern Query Results

SPARQL
https://en.wikipedia.org/wiki/Redland_RDF_Application_Framework
- http://librdf.org/notes/contexts.html
https://en.wikipedia.org/wiki/Jena_(framework)
SAIL (Storage and Inferencing Layer) API
https://en.wikipedia.org/wiki/CubicWeb
RDFLib

rdfs:seeAlso

Linked Data
Semantic Web
Semantic Web Standards
Semantic Web Tools

Distributed Databases ¶

Wikipedia: https://en.wikipedia.org/wiki/Distributed_database

Wikipedia: https://en.wikipedia.org/wiki/Distributed_data_store

See: Distributed Algorithms

Accumulo ¶

Wikipedia:
Homepage: https://accumulo.apache.org/
Download: https://accumulo.apache.org/downloads/
Source: git https://github.com/apache/accumulo
Docs: https://accumulo.apache.org/1.7/accumulo_user_manual.html
Docs: https://accumulo.apache.org/1.7/accumulo_user_manual.html#_accumulo_design
Twitter: https://twitter.com/apacheaccumulo

Apache Accumulo is an Open Source distributed database key/value store written in Java based on BigTable which adds realtime queries, streaming iterators, row-level ACLs and a number of additional features.

Accumulo supports MapReduce-style computation.
Accumulo supports streaming iterator computation.
Accumulo supports HDFS.
Accumulo implements a programmatic Java query API.

BigTable ¶

Wikipedia: https://en.wikipedia.org/wiki/BigTable

Docs: http://research.google.com/archive/bigtable.html

Google BigTable is a open reference design for a distributed key/value column store and a proprietary production database system.

BigTable functionality overlaps with that of the newer Pregel and Spanner distributed databases.
Cloud BigTable is a PaaS / SaaS service with Java integration through an adaptation of HBase API.

Apache Beam ¶

Homepage: https://beam.apache.org/
Src: git://git.apache.org/beam.git
Src: https://github.com/apache/beam
Docs: https://beam.apache.org/documentation/

Apache Beam is an open source batch and streaming parallel data processing framework with support for Apache Apex, Apache Flink, `Apache Spark`_, and Google Cloud Dataflow.

Cassandra ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_Cassandra
Homepage: https://cassandra.apache.org/
Download: https://cassandra.apache.org/download/
Source: git https://github.com/apache/cassandra
Docs: https://wiki.apache.org/cassandra/FrontPage
Docs: https://wiki.apache.org/cassandra/GettingStarted
Docs: http://docs.datastax.com/en/latest-dsc/
Docs: http://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureIntro_c.html

Apache Cassandra is an Open Source distributed key/value super column store written in Java.

Cassandra is similar to Amazon AWS Dynamo and BigTable.
Cassandra supports MapReduce-style computation.
Cassandra supports HDFS.
Facebook is one primary supporter of Cassandra development.

Hadoop ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_Hadoop
Homepage: https://hadoop.apache.org/
Download: https://hadoop.apache.org/releases.html
Source: git git://git.apache.org/hadoop.git
Source: git https://github.com/apache/hadoop
Docs: http://hadoop.apache.org/docs/current/
Docs: http://hadoop.apache.org/docs/stable/

Apache Hadoop is a collection of Open Source distributed computing components; particularly for MapReduce-style computation over Hadoop HDFS distributed filesystem.

HBase ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_HBase
Homepage: https://hbase.apache.org/
Download: https://www.apache.org/dyn/closer.cgi/hbase/
Source: git git://git.apache.org/hbase.git
Source: git https://github.com/apache/hbase
Docs: https://hbase.apache.org/book.html
Docs: https://hbase.apache.org/book.html#conceptual.view

Apache HBase is an Open Source distributed key/value super column store based on BigTable written in Java that does MapReduce-style computation over Hadoop HDFS.

HBase has a Java API, a RESTful API, an avro API, and a Thrift API

Hive ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_Hive
Homepage: https://hive.apache.org/
Download: https://hive.apache.org/downloads.html
Docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual
Docs: https://hive.apache.org/javadocs/r1.2.1/api/index.html
Docs: https://cwiki.apache.org/confluence/display/Hive/Home

Apache Hive is an Open Source data warehousing platform written in java.

Hive can read data from HDFS and S3.
Hive supports Avro, Parqet.
HiveQL is a SQL-like language.

Parquet ¶

Homepage: https://parquet.apache.org/
Download: https://parquet.apache.org/downloads/
Source: git git://git.apache.org/incubator-parquet-mr.git
Source: git https://github.com/apache/parquet-mr
Standard: https://github.com/apache/parquet-format
Docs: https://parquet.apache.org/documentation/latest/

Apache Parqet is an Open Source columnar storage format for Distributed Databases

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

The Parquet format and Parquet metadata are encoded with Thrift:
See also: CSV, CSVW

Presto ¶

Homepage: https://prestodb.io/
Source: git https://github.com/facebook/presto
Docs: https://prestodb.io/docs/current/

Presto is an Open Source distributed query engine designed to query multiple datastores at once.

Presto has connectors for Cassandra, Hive, JMX, Kafka, MySQL, and PostgreSQL.
Presto does not yet support SPARQL.
Presto does not yet support SPARQL federated query.

Spark ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_Spark
Homepage: https://spark.apache.org/
Download: https://spark.apache.org/downloads.html
Source: git git://git.apache.org/spark.git
Source: git https://github.com/apache/spark
Docs: https://spark.apache.org/documentation.html
Docs: https://spark.apache.org/docs/latest/
Docs: https://spark.apache.org/docs/latest/cluster-overview.html
Docs: https://spark.apache.org/docs/latest/quick-start.html

Apache Spark is an Open Source distributed computation platform.

Spark is in-memory; and 100x faster than MapReduce.
Spark can work with data in/over/through HDFS, Cassandra, OpenStack Swift, Amazon AWS S3, and the local filesystem.
Spark can be provisioned by YARN or Mesos.
Spark has Java, Scala, Python, and R language APIs.
Spark set a world sorting benchmark record in 2014: https://spark.apache.org/news/spark-wins-daytona-gray-sort-100tb-benchmark.html

GraphX ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_Spark#GraphX
Homepage: https://spark.apache.org/graphx/
Docs: https://spark.apache.org/docs/latest/graphx-programming-guide.html

GraphX is an Open Source graph query framework built with Spark.

Distributed Algorithms ¶

Wikipedia: https://en.wikipedia.org/wiki/Distributed_algorithm

WikipediaCategory: https://en.wikipedia.org/wiki/Category:Distributed_algorithms

Distributed Databases and distributed Information Systems implement Distributed Algorithms designed to solve for Confidentiality, Integrity, and Availability.

As separate records / statements to be yield-ed or emitted:

Distributed Databases

implement Distributed Algorithms.
Distributed Information Systems

implement Distributed Algorithms.

Distributed Computing Problems ¶

Wikipedia: https://en.wikipedia.org/wiki/Distributed_computing

WikipediaCategory: https://en.wikipedia.org/wiki/Category:Distributed_computing_problems

Non-blocking algorithm ¶

Wikipedia: https://en.wikipedia.org/wiki/Non-blocking_algorithm

DHT ¶

Wikipedia: https://en.wikipedia.org/wiki/Distributed_hash_table

A DHT (Distributed Hash Table*) is a distributed key value store for storing values under a consistent file checksum hash which can be looked up with e.g. an exact string match.

At an API level, a DHT is a key/value store.
DNS is basically a DHT
Distributed Databases all implement some form of a structure simiar to a DHT (a replicated keystore); often for things like bloom filters (for fast search)
- Cassandra, Ceph, GlusterFS
browsers that maintain a local cache could implement a DHT (e.g. with WebSocket or WebRTC)
- webtorrent (Javascript, Node.js, WebRTC)

BitTorrent magnet URIs (URNs) contain a key, which is a checksum of a manifest, which can be retrieved from a DHT:

# <a href="magnet:?xt=urn:btih:IJBDPDSBT4QZLBIJ6NX7LITSZHZQ7F5I">.</a>
# key_uri = "IJBDPDSBT4QZLBIJ6NX7LITSZHZQ7F5I"
dht = DHT(); value = dht.get(key_uri)

Named Data Networking is also essentially a cached DHT.

MapReduce ¶

Wikipedia: https://en.wikipedia.org/wiki/MapReduce

MapReduce is a distributed algorithm for distributed computation.

BigTable, Hadoop, HDFS, Disco, DDFS all support MapReduce-style computation.
See also: bashreduce

Paxos ¶

Wikipedia: https://en.wikipedia.org/wiki/Paxos_(computer_science)

Docs: https://en.wikipedia.org/wiki/Paxos_(computer_science)#Production_use_of_Paxos

https://en.wikipedia.org/wiki/Paxos_(computer_science)#Production_use_of_Paxos
- BigTable, Spanner, Megastore
- Ceph
- Neo4j

Raft ¶

Wikipedia: https://en.wikipedia.org/wiki/Raft_(computer_science)

Homepage: https://raft.github.io/

https://en.wikipedia.org/wiki/Raft_(computer_science)#Basics
- Leader / Candidate / Follower
- Heartbeat (Leader -> Followers [-> Candidates])
- etcd (CoreOS, Kubernetes, configuration management)
- skydns

Bulk Synchronous Parallel ¶

Wikipedia: https://en.wikipedia.org/wiki/Bulk_synchronous_parallel

Bulk Synchronous Parallel (BSP) is a distributed algorithm for distributed computation.

Google Pregel, Apache Giraph, and Apache Spark are built for a Bulk Synchronous Parallel model
MapReduce can be expressed very concisely in terms of BSP.

https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
Programming Languages‘ implementations:
- https://en.wikipedia.org/wiki/Java_Remote_Method_Invocation
- https://twisted.readthedocs.io/en/latest/core/howto/pb-usage.html
WS-*
REST (RESTful HTTP API)
Protocol Buffers
Thrift
Avro
msgpack
WebSocket
WebRTC
JSON-WSP
LDP (Turtle or JSON-LD RDF over HTTP)
REST
WAMP
https://en.wikipedia.org/wiki/List_of_web_service_protocols

CORBA ¶

Wikipedia: https://en.wikipedia.org/wiki/Common_Object_Request_Broker_Architecture

CORBA (Common Object Request Broker Architecture) is a distributed computing protocol now defined by OMG with implementations in many languages.

CORBA is a distributed object-oriented protocol for platform-neutral distributed computing.
CORBA objects are marshalled and serialized according to an IDL (Interface Definition Language) with a limited set of datatypes (see also XSD, Distributed Computing Protocols: Protocol Buffers, Thrift, Avro, msgpack, JSON-LD)
CORBA ORBs (Object Request Brokers) route requests for objects (see also ESB)
CORBA objects are either in local address space (see also file:// / /dev/mem) or remote address space (see also dereferencable HTTP, HTTPS URLs )
CORBA objects can be looked up by reference (by URL, or NameService (see also DNS))
“CORBA Objects are passed by reference, while data (integers, doubles, structs, enums, etc.) are passed by value” – https://en.wikipedia.org/wiki/Common_Object_Request_Broker_Architecture#Features

Message Passing ¶

Wikipedia: https://en.wikipedia.org/wiki/Message_passing
https://en.wikipedia.org/wiki/Messaging_pattern
https://en.wikipedia.org/wiki/Message_passing_in_computer_clusters
https://en.wikipedia.org/wiki/Active_message

ESB ¶

Wikipedia: https://en.wikipedia.org/wiki/Enterprise_service_bus

An ESB (Enterprise Service Bus) is a centralized distributed computing component which relays (or brokers) messages with or as a message queue (MQ).

ESB is generally the name for a message queue / task worker pattern in the SOA (particularly Java).
ESBs host service endpoints for message producers and consumers.
ESBs can also maintain state, or logging.
ESB services can often be described with e.g. WSDL and/or JSON-WSP.
https://en.wikipedia.org/wiki/Category:Message-oriented_middleware

MPI ¶

Wikipedia: https://en.wikipedia.org/wiki/Message_Passing_Interface

MPI (Message Passing Interface) is a distributed computing protocol for structured data interchange with implementations in many languages.

Many supercomputing applications are built with MPI.
MPI is faster than JSON.
IPython ipyparallel supports MPI: https://ipyparallel.readthedocs.io/en/latest/

XML-RPC ¶

Wikipedia: https://en.wikipedia.org/wiki/XML-RPC

XML Remote Procedure Call defines method names with parameters and values for making function calls with XML.

Python xmlrpclib: https://docs.python.org/2/library/xmlrpclib.html

https://docs.python.org/3/library/xmlrpc.client.html

https://docs.python.org/3/library/xmlrpc.server.html

JSON-RPC ¶

Wikipedia: https://en.wikipedia.org/wiki/JSON-RPC

Specification: http://www.jsonrpc.org/specification

Avro ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_Avro
Homepage: https://avro.apache.org/
Standard: https://avro.apache.org/docs/current/spec.html
Standard: https://avro.apache.org/docs/current/trevni/spec.html
Download: https://avro.apache.org/releases.html#Download
Docs: https://avro.apache.org/docs/current/
Docs: https://avro.apache.org/docs/current/gettingstartedjava.html
Docs: https://avro.apache.org/docs/current/api/java/
Docs: https://avro.apache.org/docs/current/gettingstartedpython.html
Docs: https://avro.apache.org/docs/current/api/c/
Docs: https://avro.apache.org/docs/current/api/cpp/html/
Docs: https://avro.apache.org/docs/current/api/csharp/

Apache Avro is an RPC distributed computing protocol with implementations in many languages.

Avro schemas are defined in JSON.
Avro is similar to Protocol Buffers and Thrift, but does not require code generation.
Avro stores schemas within the data.

seeAlso:

JSON-LD maps to RDF
5stardata

Protocol Buffers ¶

Homepage: https://developers.google.com/protocol-buffers/
Src: https://github.com/google/protobuf
Docs: https://developers.google.com/protocol-buffers/docs/overview

Protocol Buffers (PB) is a standard for structured data interchange.

Protocol Buffers are faster than JSON

Thrift ¶

Wikipedia: https://en.wikipedia.org/wiki/Apache_Thrift
Homepage: https://thrift.apache.org
Src: http://github.com/apache/thrift
Docs: https://thrift.apache.org/docs/
Docs: https://thrift.apache.org/docs/idl

Thrift is a standard for structured data interchange in the style of Protocol Buffers.

Thrift is faster than JSON.

SOA ¶

Wikipedia: https://en.wikipedia.org/wiki/Service-oriented_architecture

SOA (Service Oriented Architecture) is a collection of Web Standards (e.g WS-*) and architectural patterns for distributed computing.

WS-*¶

Wikipedia: https://en.wikipedia.org/wiki/List_of_web_service_specifications

There are many web service specifications; many web service specifications often start with WS-.

https://en.wikipedia.org/wiki/List_of_web_service_specifications
Many/most WS-* standards specify XML.
Some WS-* standards also specify JSON.

WSDL ¶

Wikipedia: https://en.wikipedia.org/wiki/Web_Services_Description_Language

WSDL (Web Services Description Language) is a web standard for describing web services and the schema of their inputs and outputs.

JSON-WSP ¶

Wikipedia: https://en.wikipedia.org/wiki/JSON-WSP

JSON-WSP (JSON Web-Service Protocol) is a web standard protocol for describing services and request and response objects.

JSON-WSP is similar in function to WSDL and CORBA IDL.

ROA ¶

Wikipedia: https://en.wikipedia.org/wiki/Resource-oriented_architecture

REST ¶

Wikipedia: https://en.wikipedia.org/wiki/Representational_state_transfer

Awesome: https://github.com/marmelab/awesome-rest

REST (Representational State Transfer) is a pattern for interacting with web resources using regular HTTP methods like GET, POST, PUT, and DELETE.

A REST API is known as a RESTful API.
A REST implementation maps Create, Read, Update, Delete (CRUD) methods for URI-named collections of resources onto HTTP verbs like GET, POST, PATCH.
Sometimes, a REST implementation accepts a URL parameter like ?method=PUT e.g. for Javascript implementations on browsers which only support e.g. GET and POST.
There are many software libraries for implementing REST API Servers:
- Java, JS: Restlet:
  
  Wikipedia: https://en.wikipedia.org/wiki/Restlet
  
  Src: https://github.com/restlet
- Ruby: Grape:
  
  Src: https://github.com/ruby-grape/grape
- Python: Django REST Framework:
  
  Src: https://github.com/tomchristie/django-rest-framework
There are many software libraries for implementing REST API Clients:
- Python REST API client libraries:
  - requests:
    
    Src:
    
    Docs: http://docs.python-requests.org/en/master/
    - httpie is a CLI utility written on top of requests:
      
      Src: https://github.com/jkbrzt/httpie
  - WebTest:
    
    Src: https://github.com/Pylons/webtest
    
    Docs: https://webtest.readthedocs.io/en/latest/
    - https://pypi.python.org/pypi/webtest-plus/ (requests-auth)
    - https://github.com/django-webtest/django-webtest
  - Docs: https://westurner.github.io/wiki/awesome-python-testing#web-applications

WAMP ¶

Wikipedia: https://en.wikipedia.org/wiki/Web_Application_Messaging_Protocol
Homepage: http://wamp-proto.org
Specification: https://tools.ietf.org/html/draft-oberstet-hybi-tavendo-wamp
Src: https://github.com/wamp-proto/wamp-proto
Docs: http://wamp-proto.org/why/
Docs: http://wamp-proto.org/faq/
Docs: http://wamp-proto.org/implementations/

WAMP (Web Application Messaging Protocol) defines Publish/Subscribe (PubSub) and Remote Procedure Call (RPC) over WebSocket, JSON, and URIs

Using WAMP, you can have a browser-based UI, the embedded device and your backend talk to each other in real-time:

WAMP Router = Broker (PubSub topic broker) + Dealer (RPC)
WAMP can run on other transports (e.g. msgpack) than the preferred WebSocket w/ JSON.
- JSON-LD
Implementations:
- http://wamp-proto.org/implementations/
- http://autobahn.ws/ (Python, JS, Cpp, Android, Test Suite)
  - http://autobahn.ws/#code
https://tools.ietf.org/html/draft-oberstet-hybi-tavendo-wamp#section-6.5

WAMP Message Codes and Direction

Data Grid ¶

Wikipedia: https://en.wikipedia.org/wiki/Data_grid

Search Engine Indexing ¶

Wikipedia: https://en.wikipedia.org/wiki/Search_engine_indexing

ElasticSearch ¶

Wikipedia: https://en.wikipedia.org/wiki/Elasticsearch
Homepage: https://www.elastic.co/products/elasticsearch
Download: https://www.elastic.co/downloads/elasticsearch
Source: git https://github.com/elastic/elasticsearch
Docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
Docs: https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html
DockerHub: https://registry.hub.docker.com/u/library/elasticsearch/

ElasticSearch is an Open Source realtime search server written in Java built on Apache Lucene with a RESTful API for indexing JSON documents.

ElasticSearch supports geographical (bounded) queries.
ElasticSearch can build better indexes for faster search response times when ElasticSearch Mappings are specified.
ElasticSearch mappings can be (manually) transformed to JSON-LD @context mappings: https://github.com/westurner/elasticsearchjsonld

Haystack ¶

Homepage: http://haystacksearch.org/
Source: git https://github.com/django-haystack/django-haystack
PyPI: https://pypi.python.org/pypi/django-haystack
Docs: https://django-haystack.readthedocs.io/en/latest/

Haystack is an Open Source Python Django API for a number of search services (e.g. Solr, ElasticSearch, Whoosh, Xapian).

Lucene ¶

Wikipedia: https://en.wikipedia.org/wiki/Lucene
Homepage: https://lucene.apache.org/
Download: https://lucene.apache.org/core/downloads.html
Source: svn http://svn.apache.org/repos/asf/lucene/dev/trunk
Docs: https://lucene.apache.org/core/
Docs: https://lucene.apache.org/core/5_2_0/

Apache Lucene is an Open Source search indexing service written in java.

ElasticSearch, Nutch, and Solr are implemented on top of Lucene.

Nutch ¶

Wikipedia: https://en.wikipedia.org/wiki/Nutch
Homepage: https://nutch.apache.org/
Download: https://nutch.apache.org/downloads.html
Source: git git://git.apache.org/nutch.git
Source: git https://github.com/apache/nutch
Docs: https://nutch.apache.org/apidocs/apidocs-2.3/index.html
Docs: https://wiki.apache.org/nutch/
Docs: https://wiki.apache.org/nutch/#Tutorials

Apache Nutch is an Open Source distributed web crawler and search engine written in Java and implemented on top of Lucene.

Nutch has a pluggable storage and indexing API with support for e.g. Solr, ElasticSearch.

Solr ¶

Wikipedia:
Homepage: https://lucene.apache.org/solr/
Download: https://lucene.apache.org/solr/mirrors-solr-latest-redir.html
Docs: https://lucene.apache.org/solr/resources.html
Docs: https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/
Docs: https://wiki.apache.org/solr/

Apache Solr is an Open Source web search platform written in Java and implemented on top of Lucene.

Whoosh ¶

Homepage:
PyPI: https://pypi.python.org/pypi/Whoosh
Docs: https://pythonhosted.org/Whoosh/

Whoosh is an Open Source search indexing service written in Python.

Xapian ¶

Wikipedia: https://en.wikipedia.org/wiki/Xapian
Homepage: http://xapian.org/
Docs: http://xapian.org/docs/
Docs: http://xapian.org/docs/apidoc/html/inherits.html

Xapian is an Open Source search library written in C++ with bindings for many languages.

Information Retrieval ¶

Wikipedia: https://en.wikipedia.org/wiki/Information_retrieval

Docs: http://nlp.stanford.edu/IR-book/information-retrieval.html

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

http://nlp.stanford.edu/IR-book/

Time Standards¶

International Atomic Time (IAT)¶

Wikipedia: https://en.wikipedia.org/wiki/International_Atomic_Time

International Atomic Time (IAT) is an international standard for extremely precise time keeping; which is the basis for UTC Earth time and for Terrestrial Time (Earth and Space).

Long Now Dates¶

Homepage: https://en.wikipedia.org/wiki/Long_Now_Foundation

Docs: https://en.wikipedia.org/wiki/Year_10,000_problem

 2015    # ISO8601 date
02015    # 5-digit Y10K date

Decimal Time¶

Wikipedia: https://en.wikipedia.org/wiki/Decimal_time

https://en.wikipedia.org/wiki/Decimal_time#Conversions
https://en.wikipedia.org/wiki/Decimal_time#Fractional_days
https://en.wikipedia.org/wiki/Leap_year (~365.25 days/yr)
https://en.wikipedia.org/wiki/Leap_second (rotation time ~= atomic time)

Unix Time¶

Wikipedia: https://en.wikipedia.org/wiki/Unix_time

Defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds

Unix time is the delta in seconds since 1970-01-01T00:00:00Z, not counting leap seconds:

0                       # Unix time
1970-01-01T00:00:00Z    # ISO8601 timestamp

1435255816              # Unix time
2015-06-25T18:10:16Z    # ISO8601 timestamp

Note

Unix time does not count leap seconds.

https://en.wikipedia.org/wiki/Unix_time#Leap_seconds

See also: Swatch Internet Time (Beat Time)

Year Zero¶

Wikipedia: https://en.wikipedia.org/wiki/0_(year)

The Gregorian Calendar (e.g. Common Era, Julian Calendar) does not include a year zero; (1 BCE is followed by 1 CE).
Astronomical year numbering includes a year zero.
Before Present dates do not specify a year zero. (because they are relative to the current (or published) date).

Astronomical year numbering¶

Wikipedia: https://en.wikipedia.org/wiki/Astronomical_year_numbering

Astronomical year numbering includes a year zero:

Tools with support for Astronomical year numbering:

AstroPy is a Python library that supports astronomical year numbering:

https://astropy.readthedocs.io/en/latest/time/

Before Present (BP)¶

Wikipedia: https://en.wikipedia.org/wiki/Before_Present

Before Present (BP) dates are relative to the current date (or date of publication); e.g. “2.6 million years ago”.

Common Era (CE)¶

Wikipedia: https://en.wikipedia.org/wiki/Common_Era
Docs: https://en.wikipedia.org/wiki/Pax_Romana
Docs: Year Zero

BCE (Before Common Era) == BC
- https://en.wiktionary.org/wiki/BCE
- https://en.wiktionary.org/wiki/BC
CE (Common Era) == AD (Anno Domini)
- https://en.wiktionary.org/wiki/CE
- https://en.wiktionary.org/wiki/AD

Common Era and Year Zero:

BCE == -5000 CE
BCE ==    -1 CE
BCE ==     0 CE
CE ==     0 BCE
CE ==     1 CE
CE ==  2015 CE

Note

Are these off by one?

Astronomical year numbering – you must convert from julian/gregorian dates to Astronomical year numbering.
Year Zero – they are off by one (“there is no year zero”).

Common Era and Python datetime calculations:

# Paleolithic Era (2.6m years ago -> 12000 years ago)
# "2.6m years ago" = (2.6m - (2015)) BCE = 2597985 BCE = -2597985 CE

2597985 BCE == -2597985 CE

### Python datetime w/ scientific notation string formatter
>>> import datetime
>>> year = datetime.datetime.now().year
>>> '{:.6e}'.format(2.6e6 - year)
'2.597985e+06'

### Python datetime supports (dates >= 1 BCE).
>>> datetime.date(1, 1, 1)
datetime.date(1, 1, 1)
>>> datetime.datetime(1, 1, 1)
>>> datetime.datetime(1, 1, 1, 0, 0)

### Python pypi:arrow supports (dates >= 1 BCE).
>>> !pip install arrow
>>> arrow.get(1, 1, 1)
<Arrow [0001-01-01T00:00:00+00:00]>

### astropy.time.Time supports (1 BCE <= dates >= 1 CE) and/or *Year Zero*
### https://astropy.readthedocs.io/en/latest/time/
>>> !conda install astropy
>>> import astropy.time
>>> # TimeJulianEpoch (Julian date (jd) ~= Common Era (CE))
>>> astropy.time.Time(-2.6e6, format='jd', scale='utc')
<Time object: scale='utc' format='jd' value=-2600000.0>

Time Zones¶

Wikipedia: https://en.wikipedia.org/wiki/Time_zone

https://en.wikipedia.org/wiki/Daylight_saving_time

https://en.wikipedia.org/wiki/List_of_UTC_time_offsets

https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

ISO8601

UTC¶

Wikipedia: https://en.wikipedia.org/wiki/Coordinated_Universal_Time

UTC (Coordinated Universal Time) is the primary terrestrial Earth-based clock time.

Earth Time Zones are specified as offsets from UTC.
UTC time is set determined by International Atomic Time (IAT); with occasional leap seconds to account for the difference between Earth’s rotational time and the actual passage of time according to the decay rate of cesium atoms (an SI Unit calibrated with an atomic clock; see QUDT).
Many/most computer systems work with UTC, but are not exactly synchronized with International Atomic Time (IAT) (see also: RTC, NTP and time drift).

US Time Zones¶

Wikipedia: https://en.wikipedia.org/wiki/Time_in_the_United_States

https://en.wikipedia.org/wiki/Time_in_the_United_States#Standard_time_and_daylight_saving_time

https://en.wikipedia.org/wiki/History_of_time_in_the_United_States

Time Zone names, URIs, and ISO8601 UTC offsets:

Table of US Time Zones¶
Time Zone names, URNs, URIs	UTC Offset	UTC DST Offset
https://en.wikipedia.org/wiki/Coordinated_Universal_Time #tz: Coordinated Universal Time, UTC, Zulu	-0000 Z	+0000 Z
https://en.wikipedia.org/wiki/Atlantic_Time_Zone https://en.wikipedia.org/wiki/America/Halifax #tz: Atlantic, Antarctica (Palmer), AST, ADT America/Halifax	-0400 AST	-0300 ADT
https://en.wikipedia.org/wiki/America/St_Thomas #tz: America/St_Thomas, America/Virgin	-0400	-0400
https://en.wikipedia.org/wiki/Eastern_Time_Zone https://en.wikipedia.org/wiki/EST5EDT #tz: Eastern, EST, EDT America/New_York	-0500 EST	-0400 EDT
https://en.wikipedia.org/wiki/Central_Time_Zone https://en.wikipedia.org/wiki/CST6CDT #tz: Central, CST, CDT America/Chicago	-0600 CST	-0500 CDT
https://en.wikipedia.org/wiki/Mountain_Time_Zone https://en.wikipedia.org/wiki/MST7MDT #tz: Mountain, MST, MDT America/Denver	-0700 MST	-0600 MDT
https://en.wikipedia.org/wiki/Pacific_Time_Zone https://en.wikipedia.org/wiki/PST8PDT #tz: Pacific, PST, PDT America/Los_Angeles	-0800 PST	-0700 PDT
https://en.wikipedia.org/wiki/Alaska_Time_Zone AKST9AKDT #tz: Alaska, AKST, AKDT America/Juneau	-0900 AKST	-0800 AKDT
https://en.wikipedia.org/wiki/Hawaii-Aleutian_Time_Zone HAST10HADT #tz: Hawaii Aleutian, HAST, HADT Pacific/Honolulu	-1000 HAST	-0900 HADT
https://en.wikipedia.org/wiki/Samoa_Time_Zone #tz: Samoa Time Zone, SST Pacific/Samoa	-1100 SST	-1100 SST
https://en.wikipedia.org/wiki/Chamorro_Time_Zone #tz: Chamorro, Guam Pacific/Guam	+1000	+1000
https://en.wikipedia.org/wiki/Time_in_Antarctica Antarctica (Amundsen, McMurdo), South Pole Antarctica/South_Pole	+1200	+1300

US Daylight Saving Time¶

Wikipedia: https://en.wikipedia.org/wiki/Daylight_saving_time_in_the_United_States

Currently, daylight saving time starts on the second Sunday in March and ends on the first Sunday in November, with the time changes taking place at 2:00 a.m. local time.

With a mnemonic word play referring to seasons, clocks “spring forward and fall back” — that is, in spring (technically late winter) the clocks are moved forward from 2:00 a.m. to 3:00 a.m., and in fall they are moved back from 2:00 am to 1:00 am.

Daylight Savings Time Starts and Ends on the following dates (from https://en.wikipedia.org/wiki/Time_in_the_United_States#Daylight_saving_time):

Year	DST start date	DST end date
2015	2015-03-08 02:00	2015-11-01 02:00
2016	2016-03-13 02:00	2016-11-06 02:00
2017	2017-03-12 02:00	2017-11-05 02:00
2018	2018-03-11 02:00	2018-11-04 02:00
2019	2019-03-10 02:00	2019-11-03 02:00
2020	2020-03-08 02:00	2020-11-01 02:00

ISO8601¶

Wikipedia: https://en.wikipedia.org/wiki/ISO_8601

Standard: http://www.iso.org/iso/iso8601

ISO8601 is an ISO standard for specifying Gregorian dates, times, datetime intervals, durations, and recurring datetimes.

The date command can print ISO8601 -compatible datestrings:

$ date +'%FT%T%z'
2016-01-01T22:11:59-0600

$ date +'%F %T%z'
2016-01-01 22:11:59-0600

Roughly, an ISO8601 datetime is specified as: year, dash month, dash day, (T or `` `` [space-character]), hour, colon, minute, colon, second, (Z [for UTC] or a time zone offset (e.g. +/- -0000, +0000)); where the dashes and colons are optional.
ISO8601 specifies a standard for absolute time durations: start date, forward-slash, end date.
ISO8601 specifies a standard for relative time durations: number of years Y, months M, days D, hours H, minutes M, and seconds S.
A Z timezone specifies UTC (Universal Coordinated Time) (or “Zulu”) time.
Many/most W3C standards (such as XSD) specify ISO8601 time formats: http://www.w3.org/TR/NOTE-datetime

A few examples of ISO8601:

2014
2014-10
2014-10-23
20141023
2014-10-23T20:59:30+Z       # UTC / Zulu
2014-10-23T20:59:30Z        # UTC / Zulu
2014-10-23T20:59:30-06:00   # CST
2014-10-23T20:59:30-06      # CST
2014-10-23T20:59:30-05:00   # CDT
2014-10-23T20:59:30-05      # CDT
20
20:59
2059
20:59:30
205930
2014-10-23T20:59:30Z/2014-10-23T21:00:00Z
2014-10-23T20:59:30-05:00/2014-10-23T21:00:00-06
PT1H
PT1M
P1M
P1Y1M1W1DT1H1M1S

Note

AFAIU, ISO8601 does not specify standards for milliseconds, microseconds, nanoseconds, picoseconds, femtoseconds, or attoseconds.

NTP¶

Wikipedia: https://en.wikipedia.org/wiki/Network_Time_Protocol

Homepage: http://www.pool.ntp.org/en/

NTP (Network Time Protocol) is a standard for synchronizing clock times.

Most Operating Systems and mobile devices support NTP.
NTP clients calculate time drift (or time skew) and network latency and then gradually adjust the local system time to the most recently retrieved server time.
Many OS distributions run their own NTP servers (in order to reduce load on the core NTP pool servers).

Linked Data¶

Wikipedia: https://en.wikipedia.org/wiki/Linked_data

http://www.w3.org/DesignIssues/LinkedData.html
Linked Data Standards:
- W3C: https://www.w3.org/TR/#tr_Linked_Data

5 ★ Linked Data¶

http://www.w3.org/TR/ld-glossary/#x5-star-linked-open-data

☆

Publish data on the Web in any format (e.g., PDF, JPEG) accompanied by an explicit Open License (expression of rights).

☆☆

Publish structured data on the Web in a machine-readable format (e.g. XML).

☆☆☆

Publish structured data on the Web in a documented, non-proprietary data format (e.g. CSV, KML).

☆☆☆☆

Publish structured data on the Web as RDF (e.g. Turtle, RDFa, JSON-LD, SPARQL.)

☆☆☆☆☆

In your RDF, have the identifiers be links (URLs) to useful data sources.

—http://5stardata.info/

See: Semantic Web

Semantic Web¶

Wikipedia: https://en.wikipedia.org/wiki/Semantic_Web

WikipediaCategory: https://en.wikipedia.org/wiki/Category:Semantic_Web

https://en.wikipedia.org/wiki/Template:Semantic_Web

https://en.wikipedia.org/wiki/Semantics_(computer_science)

W3C Semantic Web Wiki:

Semantic Web Standards¶

https://en.wikipedia.org/wiki/Statement_(computer_science)

https://en.wikipedia.org/wiki/Resource_(computing)

https://en.wikipedia.org/wiki/Entity-attribute-value_model

https://en.wikipedia.org/wiki/Tuple

https://en.wikipedia.org/wiki/Reification_(computer_science)#Reification_on_Semantic_Web

https://en.wikipedia.org/w/index.php?title=Eigenclass_model&oldid=592778140#In_RDF_Schema

Representations / Serializations

RDF: N-Triples, RDF/XML, TriX, N3, Turtle, TriG, RDFa, JSON-LD

Vocabularies

RDFS: DCMI, SKOS, Schema.org

Query APIS

SPARQL, LDP

Ontologies

OWL: PROV, OA, QUDT

Reasoners

See:
- Description Logic
- OWL 2 Profiles
- Entailment

Web Standards¶

Wikipedia: https://en.wikipedia.org/wiki/Web_standards

Web Names¶

URL¶

URL

URI¶

URN¶

URN

IEC¶

Wikipedia: https://en.wikipedia.org/wiki/International_Electrotechnical_Commission

Homepage: http://www.iec.ch/

IEC (International Electrotechnical Commission) is a standards body.

List of IEC standards: https://en.wikipedia.org/wiki/List_of_IEC_standards

IETF¶

Wikipedia: https://en.wikipedia.org/wiki/Internet_Engineering_Task_Force

Homepage: https://www.ietf.org/

IETF (Internet Engineering Task Force) is a standards body.

List of IETF standards: https://tools.ietf.org/html/

ISO¶

Wikipedia: https://en.wikipedia.org/wiki/International_Organization_for_Standardization

Homepage: http://www.iso.org/

ISO (International Organization for Standardization) is a standards body.

List of ISO standards: http://www.iso.org/iso/home/standards.htm

OMG¶

Wikipedia: https://en.wikipedia.org/wiki/Object_Management_Group

Homepage: http://www.omg.org/

OMG (Object Management Group) is a standards body.

UML is an OMG standard.
CORBA is now an OMG standard.
List of OMG standards: http://www.omg.org/spec/

https://en.wikipedia.org/wiki/Object_Management_Group#OMG_Standards

W3C¶

Wikipedia: https://en.wikipedia.org/wiki/World_Wide_Web_Consortium

Homepage: http://www.w3.org/

W3C (World Wide Web Consortium) is a standards body.

List of W3C standards: http://www.w3.org/TR/
- https://www.w3.org/TR/#tr_Linked_Data

HTTP¶

Wikipedia: https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
Standard: https://tools.ietf.org/html/rfc2616
Standard: http://tools.ietf.org/html/rfc7230#page-5
Docs: https://www.mnot.net/blog/2014/06/07/rfc2616_is_dead
URI Scheme: http://
URI Scheme: https://

HTTP (HyperText Transfer Protocol) is an Open Source text-based request-response TCP/IP protocol for text and binary data interchange.

HTTPS (Secure HTTP) wraps HTTP in SSL/TLS to secure HTTP.

HTTP in RDF¶

Standard: http://www.w3.org/TR/HTTP-in-RDF10/
Namespace: http://www.w3.org/2011/http#
Namespace: `<http://www.w3.org/2011/http-headers> .`__
Namespace: `<http://www.w3.org/2011/http-methods> .`__
Namespace: `<http://www.w3.org/2011/http-statusCodes> .`__
xmlns: @prefix http: <http://www.w3.org/2011/http#> .
xmlns: @prefix http-headers: <http://www.w3.org/2011/http-headers> .
xmlns: @prefix http-methods: <http://www.w3.org/2011/http-methods> .
xmlns: @prefix http-statusCodes: <http://www.w3.org/2011/http-statusCodes> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/http

HTTP-in-RDF is a standard for representing HTTP as RDF.

HTTPS¶

Standard: https://tools.ietf.org/html/rfc2818 (2000)
Wikipedia: https://en.wikipedia.org/wiki/HTTPS
Wikipedia: https://en.wikipedia.org/wiki/Transport_Layer_Security
Wikipedia: https://en.wikipedia.org/wiki/Secure_Sockets_Layer

HTTPS (HTTP over SSL) is HTTP wrapped in TLS/SSL.

TLS (Transport Layer Security)
SSL (Secure Sockets Layer)

HTTP STS¶

Wikipedia: https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security

HTTP STS (HTTP Strict Transport Security) is a standardized extension for notifying browsers that all requests should be made over HTTPS indefinitely or for a specified time period.

CSS¶

Wikipedia: https://en.wikipedia.org/wiki/Cascading_Style_Sheets

Docs: CSS

CSS (Cascading Style Sheets) define the presentational aspects of HTML and a number of mobile and desktop web framworks.

CSS is designed to ensure separation of data and presentation. With javascript, the separation is then data, code, and presentation.

RTMP¶

Wikipedia: https://en.wikipedia.org/wiki/Real_Time_Messaging_Protocol

RTMP is a TCP/IP protocol for streaming audio, video, and data originally for Flash which is now Open Source.

https://en.wikipedia.org/wiki/Real_Time_Messaging_Protocol#Client_software
- Adobe Flash Player
- VLC
https://en.wikipedia.org/wiki/Real_Time_Messaging_Protocol#Server_software
- Adobe Flash Live Media Server
- Amazon AWS S3 HTTP Object Storage, CloudFront CDN
- Helix Universal Media Server
- Red5 (Open Source)
- FFmpeg (Open Source)
- nginx-rtmp-module (Open Source)
- FreeSwitch (OpenSource, VoIP, SIP, Video Chat)
WebRTC solves for all of the RTMP use cases, and is becoming as or more widely deployed than Flash Player (especially with mobile devices).

WebSocket¶

Wikipedia: https://en.wikipedia.org/wiki/WebSocket

URI Scheme: ws://

WebSocket is a full-duplex (two-way) TCP/IP protocol for audio, video, and data which can interoperate with HTTP Web Servers.

WebSockets are often more efficient than other methods for realtime HTTP like HTTP Streaming and long polling.
WebSockets work with many/most HTTP proxies

https://en.wikipedia.org/wiki/Comparison_of_WebSocket_implementations

Python: pypi:gevent-websocket, pypi:websockets (asyncio), pypi:autobahn (pypi:twisted, asyncio)

WebRTC¶

Wikipedia: https://en.wikipedia.org/wiki/WebRTC
Homepage: http://www.webrtc.org/
Standard: http://tools.ietf.org/wg/rtcweb/
Docs: https://webrtc.github.io/samples/

WebRTC is a web standard for decentralized or centralized streaming of audio, video, and data in browser, without having to download any plugins.

Note

WebRTC is supported by a growing number of browsers: http://iswebrtcreadyyet.com/

Notably, Internet Explorer and Safari still require a plugin to handle WebRTC.

HTTP/2¶

Wikipedia: https://en.wikipedia.org/wiki/HTTP/2
Homepage: https://http2.github.io/
Standard: https://http2.github.io/http2-spec/
Standard: https://http2.github.io/http2-spec/compression.html
Standard: https://tools.ietf.org/html/rfc7540
Docs: https://github.com/http2/http2-spec/wiki/Implementations

HTTP/2 (HTTP2) is the newest standard for HTTP.

HTTP/2 is largely derived from the SPDY protocol.

HTML¶

Wikipedia: https://en.wikipedia.org/wiki/HTML

HTML (HyperText Markup Language) is a Open Source standard for representing documents with tags, attributes, and hyperlinks.

Recent HTML standards include HTML4, XHTML, and HTML5.

HTML4¶

Standard: http://www.w3.org/TR/html4/

HTML4 is the fourth generation HTML standard.

XHTML¶

Wikipedia: https://en.wikipedia.org/wiki/XHTML

Standard: http://www.w3.org/TR/xhtml2/

XHTML is an XML-conforming HTML standard which is being superseded by HTML5.

Compared to HTML4, XHTML requires closing tags, suports additional namespace declarations, and expects things to be wrapped in CDATA blocks, among a few other notable differences.

XHTML has not gained the widespread adoption of HTML4, and is being largely superseded by HTML5.

HTML5¶

Standard: http://www.w3.org/TR/html5/

HTML5 is the fifth generation HTML standard with many new (and removed) features.

Like its predecessors, HTML5 is not case sensitive, but it is recommended to use lowercased tags and attributes.

Differences Between HTML4 and HTML5

https://html-differences.whatwg.org/

HTML5 does not require closing tags (many browsers had already implemented routines for auto-closing broken markup).
Frames have been removed
Presentational attributes have been removed (in favor of CSS)

HTML 5.1

HTML 5.1 is in the works:

http://www.w3.org/html/wg/drafts/html/master/

XML¶

Wikipedia: https://en.wikipedia.org/wiki/XML

Standard: http://www.w3.org/TR/xml/

XML (Extensible Markup Language) is a standard for representing data with tags and attributes.

Like PDF, XML is derived from SGML.

XSD¶

Wikipedia: https://en.wikipedia.org/wiki/XML_Schema_(W3C)
Standard: http://www.w3.org/TR/xmlschema11-2/
Namespace: http://www.w3.org/2001/XMLSchema#
xmlns: @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/xsd

XSD (XML Schema Datatypes) are standard datatypes for things like strings, integers, floats, and dates for XML and also RDF.

https://www.w3.org/TR/xmlschema11-2/#built-in-datatypes

JSON¶

Wikipedia: https://en.wikipedia.org/wiki/JSON
Standard: https://tools.ietf.org/html/rfc7159
Homepage: http://json.org/

JSON (JavaScript Object Notation) is a standard for representing data in a JavaScript compatible way; with a restricted set of data types.

Conforming JSON does not contain JavaScript code, only data. It is not safe to eval JSON, because it could contain code.

There are many parsers for JSON.

JSON-LD adds RDF Linked Data support to JSON with @context.

CSV¶

Wikipedia: https://en.wikipedia.org/wiki/Comma-separated_values
Standard: https://tools.ietf.org/html/rfc4180
Extension: .csv
MIME Type: text/csv

CSV (Comma Separated Values) as a flat file representation for columnar data with rows and columns.

Most spreadsheet tools can export (raw and computed) data from a sheet into a CSV file, for use with many other tools.

CSVW¶

Homepage: https://w3c.github.io/csvw/
Standard: http://www.w3.org/TR/tabular-data-model/
Standard: http://www.w3.org/TR/tabular-metadata/
Standard: http://www.w3.org/TR/csv2json/
Standard: http://www.w3.org/TR/csv2rdf/
Namespace: http://www.w3.org/ns/csvw#
xmlns: @prefix csvw: <http://www.w3.org/ns/csvw#> .
@context: http://www.w3.org/ns/csvw.jsonld

CSVW (CSV on the Web) is a set of relatively new standards for representing CSV rows and columns as RDF (and JSON / JSON-LD) along with metadata.

URIs for datatypes (XSD, ...)
URIs for columns (RDF)
Document Metadata
CSV -> JSON ( -> JSON-LD -> RDF )
CSV -> RDF

RDF¶

Wikipedia: https://en.wikipedia.org/wiki/Resource_Description_Framework
xmlns: @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/rdf

RDF (Resource Description Framework) is a standard data model for representing data as triples.

Primer

Concepts

Useful Resources

“Linked Data Patterns: A pattern catalogue for modelling, publishing, and consuming Linked Data” http://patterns.dataincubator.org/book/

RDF Interfaces¶

Standard: http://www.w3.org/TR/rdf-interfaces/

Docs: http://www.w3.org/TR/rdf-interfaces/#high-level-api

RDF Interfaces is an Open Source standard for RDF APIs (e.g. as implemented by RDF libraries and RDF Triplestores.

createBlankNode –> BlankNode
createNamedNode –> NamedNode
createLiteral –> Literal
createTriple –> Triple (RDFNode s, RDFNode p, RDFNode, o)
createGraph –> []Triple
createAction –> TripleAction (TripleFilter, TripleCallback)
createProfile –> Profile
createTermMap –> TermMap
createPrefixMap –> PrefixMap

Implementations of RDF Interfaces:

Javascript and/or Node.js implementations of RDF Interfaces:

http://www.w3.org/community/rdfjs/wiki/Comparison_of_RDFJS_libraries
RDFLib (python) mappings to RDF Interfaces:
- BlankNode -> rdflib.term.BNode
- NamedNode -> rdflib.term.URIRef, rdflib.term.Variable ? TODO
- Literal -> rdflib.term.Literal
- Triple -> tuple()
- Graph -> rdflib.graph.Graph, rdflib.graph.ConjunctiveGraph, rdflib.graph.QuotedGraph, list()
- Action -> _____ TODO
- TripleFilter / TripleCallback -> rdflib.store.TripleAddedEvent`, ``rdflib.store.TripleRemovedEvent
- https://rdflib.readthedocs.io/en/latest/apidocs/rdflib.html#rdflib.term.Node
- Profile -> ______ TODO
- TermMap -> ____ TODO
- PrefixMap -> rdflib.namespace.NamespaceManager https://rdflib.readthedocs.io/en/latest/apidocs/rdflib.html#rdflib.namespace.NamespaceManager
Note

rdflib is not order-preserving at this time, because internally Graphs are represented as dict and not yet collections.OrderedDict (for which there is a now C-implementation in the Python 3.5 standard library); so output may not be in the same sequence as input (or a rdflib.store.Store, even) even when there are no changes made to the graph.
- It would be preferable to maintain the input source order (though, especially for large distributed queries which merge triples into one context, sorted / source order is not a good assumption to make).
- rdf:List are ordered.
  - rdf:List with Turtle / N3: :examplePredicate [ "uno"@es, "one"@en ] ;)
    http://www.w3.org/TR/rdf-schema/#ch_list
    
    rdf:first, rdf:rest, rdf:nil: “RDFS does not require that there be only one first element of a list-like structure, or even that a list-like structure have a first element.”
  - rdf:List with JSON-LD @context:
    - http://www.w3.org/TR/json-ld/#lists-and-sets
    - http://www.w3.org/TR/json-ld/#sets-and-lists
    - {"@context": {"attr": {"@container": "@list"}}}
    - {"attr": {"@list": ["one", "uno"]}}

N-Triples¶

Wikipedia: https://en.wikipedia.org/wiki/N-Triples
Standard: http://www.w3.org/TR/n-triples/
Extension: .nt
MIME Type: application/n-triples

N-Triples is a standard for serializing RDF triples to text.

RDF/XML¶

Wikipedia: https://en.wikipedia.org/wiki/RDF/XML
Standard: http://www.w3.org/TR/rdf-syntax-grammar/
Extension: .rdf
MIME Type: application/rdf+xml

RDF/XML is a standard for serializing RDF as XML.

TriX¶

Wikipedia: https://en.wikipedia.org/wiki/TriX_(syntax)

http://www.w3.org/2004/03/trix/rdfg-1/

TriX is a standard which extends the RDF/XML RDF serialization standard with named graphs.

N3¶

Wikipedia: https://en.wikipedia.org/wiki/Notation3
Standard: http://www.w3.org/TeamSubmission/n3/
Extension: .n3
MIME Type: text/n3

N3 (Notation3) is a standard which extends the Turtle RDF serialization standard with a few extra features.

=> implies (useful for specifying production rules)

Turtle¶

Wikipedia: https://en.wikipedia.org/wiki/Turtle_(syntax)
Standard: http://www.w3.org/TR/turtle/
Extension: .ttl
MIME type: text/turtle

Turtle is a standard for serializing RDF triples into human-readable text.

TriG¶

Wikipedia: https://en.wikipedia.org/wiki/TriG_(syntax)
Standard: http://www.w3.org/TR/trig/
Extension: .trig
MIME Type: application/trig

TriG (...) extends the Turtle RDF standard to allow multiple named graphs to be expressed in one file (as triples with a named graph IRI (“quads”)).

Triples without a specified named graph are, by default, part of the “Default Graph”.

RDFa¶

Wikipedia: https://en.wikipedia.org/wiki/RDFa
Homepage: http://www.w3.org/2001/sw/wiki/RDFa
Standard: http://www.w3.org/TR/rdfa-core/
Standard: http://www.w3.org/TR/rdfa-lite/
Standard: http://www.w3.org/TR/html-rdfa/
Standard: http://www.w3.org/TR/rdfa-syntax/
Standard: https://www.w3.org/2011/rdfa-context/rdfa-1.1
Docs: http://www.w3.org/TR/rdfa-primer/

RDFa (RDF in attributes) is a standard for storing structured data (RDF triples) in HTML, (XHTML, HTML5) attributes.

Schema.org structured data can be included in an HTML page as RDFa.

RDFa 1.1 Core Context¶

Standard: https://www.w3.org/2011/rdfa-context/rdfa-1.1
Standard:  http://www.w3.org/2013/json-ld-context/rdfa11
Docs: https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/pyRdfa/initialcontext.py

The RDFa 1.1 Core Context defines a number of commonly used vocabulary namespaces and URIs (prefix mappings).

An example RDFa HTML5 fragment with vocabularies drawn from the RDFa 1.1 Core Context:

<div vocab="schema: http://schema.org/">
  <div typeof="schema:Thing">
    <span property="schema:name">RDFa 1.1 JSON-LD Core Context</span>
    <a property="schema:url">http://www.w3.org/2013/json-ld-context/rdfa11</a>
  </div>
</div>

An example JSON-LD document with the RDFa 1.1 Core Context:

{"@context": "http://www.w3.org/2013/json-ld-context/rdfa11",
 "@graph": [
   {"@type": "schema:Thing"
    "schema:name": "RDFa 1.1 JSON-LD Core Context",
    "schema:url": "http://www.w3.org/2013/json-ld-context/rdfa11"}
 ]}

Note

Schema.org is included in the RDFa 1.1 Core Context.

Schema.org does, in many places, reimplement other vocabularies e.g. for consistency with Schema.org/DataType s like schema.org/Number.

There is also Schema.org RDF, which, for example maps schema:name to rdfs:label; and OWL.

JSON-LD¶

Wikipedia: https://en.wikipedia.org/wiki/JSON-LD
Homepage: http://json-ld.org/
Standard: http://www.w3.org/TR/json-ld/
Docs: http://manu.sporny.org/2014/json-ld-origins-2/

JSON-LD (JSON Linked Data) is a standard for expressing RDF Linked Data as JSON.

JSON-LD specifies a @context for regular JSON documents which maps JSON attributes to URIs with datatypes and, optionally, languages.

http://json-ld.org/playground/

RDFS¶

Wikipedia: https://en.wikipedia.org/wiki/RDF_Schema
Standard: http://www.w3.org/TR/rdf-schema/
Namespace: http://www.w3.org/2000/01/rdf-schema#
xmlns: @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/rdfs

RDFS (RDF Schema) is an RDF standard for classes and properties.

A few notable RDFS classes:

rdfs:Resource (everything in RDF)
rdfs:Literal (strings, integers)
rdfs:Class

A few notable / frequently used properties:

rdfs:label
rdfs:comment
rdfs:seeAlso
rdfs:domain
rdfs:range
rdfs:subPropertyOf

OWL builds upon many RDFS concepts.

DCMI¶

Wikipedia: https://en.wikipedia.org/wiki/Dublin_Core
Wikipedia: https://en.wikipedia.org/wiki/Dublin_Core#DCMI_Metadata_Terms
Namespace: http://purl.org/dc/terms
xmlns: @prefix dcterms: <http://purl.org/dc/terms> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/dcterms
Namespace: http://purl.org/dc/dcmitype/
xmlns: @prefix dctypes: <http://purl.org/dc/dcmitype/> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/dctype

DCTYPES (Dublin Core Types) and DCTERMS (Dublin Core Terms) are standards for common types, classes, and properties that have been mapped to XML and RDF.

EARL¶

Standard: https://www.w3.org/TR/EARL10/
Namespace: http://www.w3.org/ns/earl#
xmlns: @prefix earl: http://www.w3.org/ns/earl#
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/earl

W3C EARL (Evaluation and Reporting Language) is an RDFS vocabulary for automated, semi-automated, and manual test results.

The JSON-LD Implementation test results are expressed with EARL:

http://json-ld.org/test-suite/

http://json-ld.org/test-suite/reports/

RDF Data Cubes¶

Standard: http://www.w3.org/TR/vocab-data-cube/
Namespace: http://purl.org/linked-data/cube#
xmlns: @prefix qb: <http://purl.org/linked-data/cube#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/qb

RDF Data Cubes vocabulary is an RDF standard vocabulary for expressing linked multi-dimensional statistical data and aggregations.

Data Cubes have dimensions, attributes, and measures
Pivot tables and crosstabulations can be expressed with RDF Data Cubes vocabulary

SKOS¶

Wikipedia: https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System
Standard: http://www.w3.org/TR/skos-reference/
Standard: http://www.w3.org/TR/skos-reference/skos.html
Namespace: http://www.w3.org/2004/02/skos/core#
xmlns: @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/skos

SKOS (Simple Knowledge Organization System) is an RDF standard vocabulary for linking concepts and vocabulary terms.

XKOS¶

Homepage: http://www.ddialliance.org/Specification/RDF/XKOS
Standard: http://rdf-vocabulary.ddialliance.org/xkos.html
Source: https://github.com/linked-statistics/xkos
Namespace: http://rdf-vocabulary.ddialliance.org/xkos#
xmlns: @prefix xkos: <http://rdf-vocabulary.ddialliance.org/xkos#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/xkos

XKOS (Extended Knowledge Organization System) is an RDF standard which extends SKOS for linking concepts and statistical measures.

FOAF¶

Wikipedia: https://en.wikipedia.org/wiki/FOAF_(ontology)
Homepage: http://www.foaf-project.org/
Standard: http://xmlns.com/foaf/spec/
Namespace: http://xmlns.com/foaf/0.1/
xmlns: @prefix foaf: <http://xmlns.com/foaf/0.1/> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/foaf

FOAF (Friend of a Friend) is an RDF standard vocabulary for expressing social networks and contact information.

SHACL¶

Standard: https://www.w3.org/TR/shacl/
Namespace: http://www.w3.org/ns/shacl#
xmlns: @prefix sh: <http://www.w3.org/ns/shacl#> .
LOVLink:

W3C SHACL (Shapes Constraint Language) is a language for describing RDF and RDFS graph shape constraints.

SHACL relaxes specific RDFS restrictions: https://www.w3.org/TR/shacl/#shacl-rdfs
Required RDFS / OWL Entailment can be specified in SHACL with the sh:entailment property and e.g. SPARQL 1.1 entailment IRIs. https://www.w3.org/TR/shacl/#entailment
https://github.com/TopQuadrant/shacl

SIOC¶

Wikipedia: https://en.wikipedia.org/wiki/Semantically-Interlinked_Online_Communities
Homepage: http://www.sioc-project.org/
Namespace: http://rdfs.org/sioc/ns#
xmlns: @prefix sioc: <http://rdfs.org/sioc/ns#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/sioc

SIOC (Semantically Interlinked Online Communities) is an RDF standard for online social networks and resources like blog, forum, and mailing list posts.

OA¶

Homepage: http://www.openannotation.org/
Standard: http://www.openannotation.org/spec/core/
Namespace: http://www.w3.org/ns/oa#
xmlns: @prefix oa: <http://www.w3.org/ns/oa#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/oa

OA (Open Annotation) is an RDF standard for commenting on anything with a URI.

Features:

Web Annotation: https://en.wikipedia.org/wiki/Web_annotation
Comment on any resource with a (stable) URI
Comment on text fragments
Comment on SVG items

Implementations:

https://github.com/hypothesis/h (Python, Pyramid)
https://github.com/openannotation/annotator (http://annotatorjs.org/)

Schema.org¶

Wikipedia: https://en.wikipedia.org/wiki/Schema.org
Homepage: https://schema.org
Download: https://schema.org/version/latest/
Source: https://github.com/schemaorg/schemaorg
Source: https://github.com/schemaorg/schemaorg/tree/sdo-phobos/data/releases/2.2
Docs: http://dataliberate.com/2016/02/evolving-schema-org-in-practice-pt1-the-bits-and-pieces/
Issues: https://github.com/schemaorg/schemaorg/issues
IssueLabels: https://github.com/schemaorg/schemaorg/labels

Schema.org is a vocabulary for expressing structured data on the web.

Schema.org can be expressed as microdata, RDF, RDFa, and JSON-LD.

.

“Schema.org: Evolution of Structured Data on the Web” (2015) https://queue.acm.org/detail.cfm?id=2857276
“Evolving Schema.org in Practice Pt1: The Bits and Pieces” (2016) http://dataliberate.com/2016/02/evolving-schema-org-in-practice-pt1-the-bits-and-pieces/
RDFa
- https://github.com/schemaorg/schemaorg/blob/sdo-callisto/data/schema.rdfa
- https://raw.githubusercontent.com/schemaorg/schemaorg/sdo-callisto/data/schema.rdfa
JSON-LD
- https://github.com/schemaorg/schemaorg/blob/sdo-callisto/data/releases/3.2/all-layers.jsonld
- https://github.com/schemaorg/schemaorg/raw/sdo-callisto/data/releases/3.2/all-layers.jsonld

Note

The https://schema.org/ site is served over HTTPS, but the schema.org terms are HTTP URIs

Schema.org RDF¶

xmlns: @prefix schema: <http://schema.org/> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/schema
Standard: https://schema.org/docs/schema_org_rdfa.html [RDFa]

RDFa
- https://github.com/schemaorg/schemaorg/blob/sdo-callisto/data/schema.rdfa
- https://raw.githubusercontent.com/schemaorg/schemaorg/sdo-callisto/data/schema.rdfa

Schema.org TopBraid RDF¶

Homepage: http://topbraid.org/schema/
Docs: http://topbraid.org/schema/
xmlns: @prefix schema: <http://schema.org/> .
xmlns: @prefix schemax: <http://topbraid.org/schemax/> .

TopBraid maintains more complete OWL RDF transformations of Schema.org.

http://topbraid.org/schema/schema.rdf RDF/XML
http://topbraid.org/schema/schema.ttl Turtle
http://topbraid.org/schema/schema-single-range.ttl Turtle with only one type per range

SPARQL¶

Wikipedia: https://en.wikipedia.org/wiki/SPARQL
Standard: http://www.w3.org/TR/sparql11-overview/
Standard: http://www.w3.org/TR/sparql11-query/
Standard: http://www.w3.org/TR/sparql11-update/
Standard: http://www.w3.org/TR/sparql11-entailment/
Standard: http://www.w3.org/TR/sparql11-federated-query/

SPARQL is a text-based query and update language for RDF triples (and quads).

Challenges:

SPARQL query requests and responses are over HTTP; however, it’s best – and often required – to build SPARQL queries with a server application, on behalf of clients.
SPARQL default LIMIT clauses and paging windows could allow for more efficient caching
See: LDP for more of a resource-based RESTful API that can be implemented on top of the graph pattern queries supported by SPARQL.

LDP¶

Spec http://www.w3.org/TR/ldp/
xmlns: @prefix ldp: <http://www.w3.org/ns/ldp#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/ldp

LDP (Linked Data Platform) is a standard for building HTTP REST APIs for RDF Linked Data.

http://www.w3.org/TR/ldp/#terms

Features:

HTTP REST API for Linked Data Platform Containers (LDPC) containing Linked Data Plaform Resources (LDPR)
Server-side Paging

OWL¶

Wikipedia: https://en.wikipedia.org/wiki/Web_Ontology_Language
Standard: http://www.w3.org/TR/owl2-overview/
Standard: http://www.w3.org/TR/owl2-primer/
Standard: http://www.w3.org/TR/owl2-quick-reference/
Standard: http://www.w3.org/TR/owl2-profiles/
xmlns: @prefix owl: <http://www.w3.org/2002/07/owl#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/owl

OWL (Web Ontology Language) layers semantics, reasoning, inference, and entailment capabilities onto RDF (and general logical set theory).

A few notable OWL classes:

owl:Class a owl:Class ; rdfs:subClassOf rdfs:Class (RDFS)
owl:Thing a owl:Class – universal class
owl:Nothing a owl:Class – empty class
owl:Restriction a rdfs:Class ; rdfs:subClassOf owl:Class

A few OWL Property types:

owl:DatatypeProperty
owl:ObjectProperty
owl:ReflexiveProperty
owl:IrreflexiveProperty
owl:SymmetricProperty
owl:TransitiveProperty
owl:FunctionalProperty
owl:InverseFunctionalProperty
owl:OntologyProperty
owl:AnnotationProperty
owl:AsymmetricProperty

https://en.wikipedia.org/wiki/Cardinality

owl:minCardinality
owl:cardinality
owl:maxCardinality

.

owl:intersectionOf
owl:unionOf
owl:complementOf
owl:oneOf

.

owl:allValuesFrom
owl:someValuesFrom

.

https://www.w3.org/2002/07/owl#

https://www.w3.org/TR/owl2-quick-reference/

PROV¶

Homepage: http://www.w3.org/2011/prov/wiki/Main_Page
Standard: http://www.w3.org/ns/prov.owl
Standard: http://www.w3.org/TR/prov-overview/
Standard: http://www.w3.org/TR/prov-primer/
Standard: http://www.w3.org/TR/prov-o/
Namespace: http://www.w3.org/ns/prov#
xmlns: @prefix prov: <http://www.w3.org/ns/prov#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/prov

PROV (Provenance) ontology is an OWL RDF standard for expressing data provenance (who, what, when, and how, to a certain extent).

https://en.wikipedia.org/wiki/Provenance#Data_provenance

DBpedia¶

Homepage: http://wiki.dbpedia.org/Ontology
Namespace: http://dbpedia.org/ontology/
xmlns: @prefix dbpedia-owl: <http://dbpedia.org/ontology/> .
LOVLink: http://dbpedia.org/ontology/

DBpedia is an OWL RDF vocabulary for expressing structured data from Wikipedia sidebar infoboxes.

DBpedia is currently the most central (most linked to and from) RDF vocabulary. (see: LODCloud)

Example:

DBpedia is generated by batch extraction on a regular basis.

QUDT¶

Homepage: http://www.linkedmodel.org/doc/qudt/1.1/
Standard: http://qudt.org/
Docs: http://www.linkedmodel.org/catalog/qudt/1.1/
Namespace: http://qudt.org/schema/qudt#
Namespace: http://qudt.org/1.1/schema/qudt#
xmlns: @prefix qudt: <http://qudt.org/schema/qudt#> .
xmlns: @prefix qudt-1.1:  <http://qudt.org/1.1/schema/qudt#> .
LOVLink: http://lov.okfn.org/dataset/lov/vocabs/qudt

QUDT (Quantities, Units, Dimensions, and Types) is an RDF standard vocabulary for representing physical units.

QUDT is composed of a number of sub-vocabularies
QUDT maintains conversion factors for Metric and Imperial Units

Examples:

qudt:SpaceAndTimeUnit

qudt:SpaceAndTimeUnit
   rdf:type owl:Class ;
   rdfs:label "Space And Time Unit"^^xsd:string ;
   rdfs:subClassOf qudt:PhysicalUnit ;
   rdfs:subClassOf
           [ rdf:type owl:Restriction ;
             owl:hasValue "UST"^^xsd:string ;
             owl:onProperty qudt:typePrefix
           ] .

QUDT Namespaces:

@prefix qudt:           <http://qudt.org/schema/qudt#> .
@prefix qudt-1.1:       <http://qudt.org/1.1/schema/qudt#> .
@prefix qudt-dimension: <http://qudt.org/vocab/dimension#> .
@prefix qudt-quantity:  <http://qudt.org/vocab/quantity#> .
@prefix qudt-unit-1.1:  <http://qudt.org/1.1/vocab/unit#> .
@prefix unit:           <http://qudt.org/vocab/unit#> .

This diagram explains how each of the vocabularies are linked and derived: http://www.linkedmodel.org/catalog/qudt/1.1/

QUDT Quantities¶

Schema

Standard: http://qudt.org/1.1/schema/quantity
Namespace: http://qudt.org/1.1/schema/quantity#
xmlns: @prefix quantity: <http://data.nasa.gov/qudt/owl/quantity#> .
Turtle: http://qudt.org/1.1/schema/OSG_quantity-(v1.1).ttl

Vocabulary

xmlns: @prefix qudt-quantity: <http://qudt.org/1.1/vocab/quantity#> .
Namespace: http://qudt.org/1.1/vocab/quantity#
Turtle: http://qudt.org/1.1/vocab/OVG_quantities-qudt-(v1.1).ttl

QUDT Quantities is an RDF schema and vocabulary for describing physical quantities.

Examples from http://qudt.org/1.1/vocab/OVG_quantities-qudt-(v1.1).ttl :

qudt-quantity:Time

qudt-quantity:Time
    rdf:type qudt:SpaceAndTimeQuantityKind ;
    rdfs:label "Time"^^xsd:string ;
    qudt:description "Time is a basic component of the measuring system used to sequence events, to compare the durations of events and the intervals between them, and to quantify the motions of objects."^^xsd:string ;
    qudt:symbol "T"^^xsd:string ;
    skos:exactMatch <http://dbpedia.org/resource/Time> .
# ...
unit:SecondTime
    qudt:quantityKind qudt-quantity:Time .

qudt-quantity:AreaTimeTemperature

qudt-quantity:AreaTimeTemperature
    rdf:type qudt:ThermodynamicsQuantityKind ;
    rdfs:label "Area Time Temperature"^^xsd:string .
# ...
unit:SquareFootSecondDegreeFahrenheit
    qudt:quantityKind qudt-quantity:AreaTimeTemperature .

QUDT Units¶

Standard: http://qudt.org/1.1/vocab/unit
Namespace: http://qudt.org/1.1/vocab/unit#
xmlns: @prefix unit: <http://qudt.org/1.1/vocab/unit> .
xmlns: @prefix qudt-unit-1.1:  <http://qudt.org/1.1/vocab/unit#> .
Turtle: http://qudt.org/1.1/vocab/OVG_units-qudt-(v1.1).ttl

The QUDT Units Ontology is an RDF vocabulary defining many units of measure.

Examples:

unit:SecondTime

unit:SecondTime
      rdf:type qudt:SIBaseUnit , qudt:TimeUnit ;
      rdfs:label "Second"^^xsd:string ;
      qudt:abbreviation "s"^^xsd:string ;
      qudt:code "1615"^^xsd:string ;
      qudt:conversionMultiplier
              "1"^^xsd:double ;
      qudt:conversionOffset
              "0.0"^^xsd:double ;
      qudt:symbol "s"^^xsd:string ;
      skos:exactMatch <http://dbpedia.org/resource/Second> .
# ...

http://www.qudt.org/qudt/owl/1.0.0/unit/Instances.html#SecondTime

unit:HorsepowerElectric

http://qudt.org/1.1/vocab/OVG_units-qudt-(v1.1).ttl

unit:HorsepowerElectric
    rdf:type qudt:NotUsedWithSIUnit , qudt:PowerUnit ;
    rdfs:label "Horsepower Electric"^^xsd:string ;
    qudt:abbreviation "hp/V"^^xsd:string ;
    qudt:code "0815"^^xsd:string ;
    qudt:symbol "hp/V"^^xsd:string .

unit:SystemOfUnits_SI

http://qudt.org/1.1/vocab/OVG_units-qudt-(v1.1).ttl

unit:SystemOfUnits_SI
      rdf:type qudt:SystemOfUnits ;
      rdfs:label "International System of Units"^^xsd:string ;
      qudt:abbreviation "SI"^^xsd:string ;
      qudt:systemAllowedUnit
              unit:ArcMinute , unit:Day , unit:MinuteTime , unit:DegreeAngle , unit:ArcSecond , unit:ElectronVolt , unit:RevolutionPerHour , unit:Femtometer , unit:DegreePerSecond , unit:DegreeCelsius , unit:Liter , unit:MicroFarad , unit:AmperePerDegree , unit:RevolutionPerMinute , unit:MicroHenry , unit:Kilometer , unit:Revolution , unit:Hour , unit:PicoFarad , unit:Gram , unit:DegreePerSecondSquared , unit:MetricTon , unit:CubicCentimeter , unit:SquareCentimeter , unit:CubicMeterPerHour , unit:KiloPascal , unit:DegreePerHour , unit:UnifiedAtomicMassUnit , unit:MilliHenry , unit:KilogramPerHour , unit:KiloPascalAbsolute , unit:NanoFarad , unit:RadianPerMinute , unit:RevolutionPerSecond ;
      qudt:systemBaseUnit unit:Kilogram , unit:Unitless , unit:Kelvin , unit:Meter , unit:SecondTime , unit:Mole , unit:Candela , unit:Ampere ;
      qudt:systemCoherentDerivedUnit
              unit:PerCubicMeter , unit:WattPerSquareMeter , unit:Volt , unit:WattPerMeterKelvin , unit:CoulombPerCubicMeter , unit:Becquerel , unit:WattPerSquareMeterSteradian , unit:KelvinPerSecond , unit:Gray , unit:RadianPerSecond , unit:VoltPerMeter , unit:HenryPerMeter , unit:WattPerSteradian , unit:JouleMeterPerMole , unit:CoulombMeter , unit:PerTeslaMeter , unit:Pascal , unit:LumenPerWatt , unit:KilogramMeterPerSecond , unit:SquareMeterKelvin , unit:MoleKelvin , unit:MeterKelvinPerWatt , unit:Steradian , unit:AmperePerMeter , unit:SquareMeterKelvinPerWatt , unit:JouleSecond , unit:MeterPerFarad , unit:KilogramPerSecond , unit:HertzPerTesla , unit:KilogramMeterSquared , unit:WattPerSquareMeterQuarticKelvin , unit:PerMeterKelvin , unit:JoulePerCubicMeterKelvin , unit:JoulePerSquareTesla , unit:JoulePerCubicMeter , unit:MeterPerKelvin , unit:AmperePerSquareMeter , unit:CubicCoulombMeterPerSquareJoule , unit:CoulombPerMeter , unit:Katal , unit:CubicMeter , unit:LumenSecond , unit:Coulomb , unit:MolePerKilogram , unit:CubicMeterPerKilogramSecondSquared , unit:PerMeter , unit:AmperePerRadian , unit:CoulombPerKilogram , unit:QuarticCoulombMeterPerCubicEnergy , unit:Tesla , unit:JoulePerKilogram , unit:MeterKelvin , unit:MeterPerSecond , unit:NewtonMeter , unit:CandelaPerSquareMeter , unit:Siemens , unit:CoulombSquareMeter , unit:KilogramPerCubicMeter , unit:KilogramSecondSquared , unit:Watt , unit:AmperePerJoule , unit:VoltPerSecond , unit:JoulePerKilogramKelvinPerCubicMeter , unit:PascalPerSecond , unit:CubicMeterPerMole , unit:KilogramPerMeter , unit:PascalSecond , unit:Joule , unit:HertzPerVolt , unit:KilogramPerSquareMeter , unit:PerTeslaSecond , unit:MolePerCubicMeter , unit:PerSecond , unit:JoulePerKelvin , unit:RadianPerSecondSquared , unit:Newton , unit:CubicMeterPerKelvin , unit:GrayPerSecond , unit:SquareMeterPerSecond , unit:CubicMeterPerKilogram , unit:KilogramPerMole , unit:SquareMeterPerKelvin , unit:SquareMeterSteradian , unit:TeslaSecond , unit:Ohm , unit:KelvinPerWatt , unit:JoulePerKilogramKelvinPerPascal , unit:WattSquareMeter , unit:MeterKilogram , unit:WattSquareMeterPerSteradian , unit:Hertz , unit:VoltPerSquareMeter , unit:CubicMeterPerSecond , unit:JoulePerMoleKelvin , unit:TeslaMeter , unit:JoulePerMole , unit:Lux , unit:FaradPerMeter , unit:PerMole , unit:JouleSecondPerMole , unit:AmpereTurnPerMeter , unit:VoltMeter , unit:SecondTimeSquared , unit:AmpereTurn , unit:JoulePerKilogramKelvin , unit:CoulombPerSquareMeter , unit:NewtonPerKilogram , unit:JoulePerSquareMeter , unit:Weber , unit:Henry , unit:MeterPerSecondSquared , unit:KilogramKelvin , unit:Sievert , unit:NewtonPerMeter , unit:WattPerSquareMeterKelvin , unit:SquareCoulombMeterPerJoule , unit:Lumen , unit:Farad , unit:HertzPerKelvin , unit:SquareMeter , unit:JoulePerTesla , unit:Radian , unit:KelvinPerTesla , unit:NewtonPerCoulomb , unit:CoulombPerMole ;
      qudt:systemPrefixUnit
              unit:Hecto , unit:Nano , unit:Tera , unit:Atto , unit:Kilo , unit:Yocto , unit:Yotta , unit:Deci , unit:Zepto , unit:Pico , unit:Femto , unit:Milli , unit:Micro , unit:Zetta , unit:Mega , unit:Centi , unit:Giga , unit:Peta , unit:Deca , unit:Exa ;
      skos:exactMatch <http://dbpedia.org/resource/International_System_of_Units> .

Wikidata¶

Wikipedia: https://en.wikipedia.org/wiki/Wikidata

Homepage: https://www.wikidata.org/

Wikidata is an Open Source collaboratively edited knowledgebase.

DBpedia scrapes data from Wikipedia Infoboxes periodically. Wikidata is a database with forms, datatypes, and alphanumerical identifiers (which do not change or redirect).
Wikidata SPARQL, RDF, and OWL will be powered by Blazegraph.

Semantic Web Tools¶

Homepage: http://www.w3.org/2001/sw/wiki/Tools

Semantic Web Tools are designed to work with RDF formats.

CKAN¶

Wikipedia: https://en.wikipedia.org/wiki/CKAN
Homepage: http://ckan.org/
Source: git https://github.com/ckan/ckan
Source: git https://github.com/ckan/ckan-docker
DockerHub: https://registry.hub.docker.com/u/ckan/ckan/
Docs: http://docs.ckan.org/en/latest/
Docs: http://docs.ckan.org/en/latest/maintaining/installing/index.html
Docs: http://docs.ckan.org/en/latest/maintaining/data-viewer.html
Docs: http://docs.ckan.org/en/latest/maintaining/paster.html
Docs: http://docs.ckan.org/en/latest/maintaining/linked-data-and-rdf.html
Docs: http://docs.ckan.org/en/latest/api/

CKAN (Comprehensive Knowledge Archive Network) is an Open Source data repository web application and API written in python with support for RDF.

https://datahub.io is powered by CKAN. LODCloud draws from datahub.io datasets.
Many national data.gov sites are powered by CKAN. (e.g https://catalog.data.gov/)
Many public and private data repositories are powered by CKAN.
CKAN is currently not yet built on an RDF triplestore.
There are Docker Dockerfiles for CKAN.

Protégé¶

Wikipedia: https://en.wikipedia.org/wiki/Protégé_(software)
Homepage: http://protege.stanford.edu/
Homepage: http://webprotege.stanford.edu/

Protégé is a knowledge management software application with support for RDF, OWL, and a few different reasoners.

Web Protégé is a web-based version of Protégé with many similar features.

Protégé is a Free and Open Source software tool.

RDFJS¶

Homepage: http://www.w3.org/community/rdfjs/

Src: https://github

RDFJS (RDF Javascript) is an acronym for referring to tools for working with RDF in the Javascript programming language.

See:

RDF Interfaces
ref:

RDFHDT¶

Homepage: http://www.rdfhdt.org/

RDFHDT (RDF Header Dictionary Triples) is an optimized binary format for storing and working with very many triples in highly compressed form.

HDT-IT is a software application for working with RDFHDT datasets:

RDFLib¶

Wikipedia: https://en.wikipedia.org/wiki/RDFLib
Homepage: https://github.com/RDFLib
Source: https://github.com/RDFLib/rdflib
Docs: https://rdflib.readthedocs.io/en/latest/

RDFLib is a library (and a collection of companion libraries) for working with RDF in the Python programming language.

Semantic Web Schema Resources¶

prefix.cc¶

Homepage: http://prefix.cc

Docs:

Lookup RDF vocabularies, classes, and properties

LOV¶

Homepage: http://lov.okfn.org/
Source: git https://github.com/pyvandenbussche/lov
SPARQL: http://lov.okfn.org/dataset/lov/sparql
Docs: http://lov.okfn.org/dataset/lov/api

LOV (“Linked Open Vocabularies”) is a web application for cataloging and viewing metadata of and links between vocabularies (RDF, RDFS, OWL)

All of the vocabularies stored in LOV as a bubble chart:

http://lov.okfn.org/dataset/lov/
LOV has a “suggest a vocabulary” feature
Many of the vocabularies stored in LOV can also be searched or looked up from prefix.cc.

URIs for Units¶

https://lists.w3.org/Archives/Public/public-vocabs/2014Jan/0157.html
- https://lists.w3.org/Archives/Public/public-vocabs/2015May/
- https://lists.w3.org/Archives/Public/public-vocabs/2015May/thread.html

LODCloud¶

Homepage: http://lod-cloud.net
Source: git https://github.com/lod-cloud/datahub2void
Datasets: http://datahub.io/group/lodcloud
Download: http://lod-cloud.net/data/void.ttl

The LOD (“Linking Open Data”) cloud diagram visualizes the nodes and edges of the Linked Open Data Cloud