LinkedReproducibility¶
#LinkedReproducibility
- Data Science
- Knowledge Engineering > Linked Data
- Units > RDF and Units [QUDT,]
- Information Systems
- ELI5: Our data is probably already aware of a cure.
- Cure: Win/Win solution
Note
This page (linkedreproducibility.rst
) is licensed with
CC0 1.0
(Public Domain).
Please do implement these ideas and specifications.
StudyGraph: Document Nodes and Link Edges¶
We should use annotations with typed, reified edges to link between various studies with comparable and incomparable analyses. (e.g. OpenAnnotation OA RDF OWL with more data than threaded comments).
- PDFs are the de-facto standard for scientific publishing
- Many journals also / instead request HTML
- PLoS, for example
- Then, a “study” is a document
- title – schema.org/name
- authors – lead first (usu.)
- schema.org/TODO – author, creator, contributor
- abstract
- tags/labels/keywords are edges to a tag/label/keyword node
- hierarchical
- MESH
- PyPI Trove Classifiers
- folksonomy
- tags
- tags often require deduplication / part of speech normalization / de-pluralization / etc
- tags
- Concept URIs
- wikipedia, dbpedia, wikidata, etc
- hierarchical
- links to Linked Data
- What we lack are structured edges/relations between the actual studies
- Develop best practices guidelines and
and/or an RDF schema and vocabulary (“
repro:
) for linking between studies, their supporting data, and their collection methods with URIs.- developing vocabularies:
- Semantic Web Tools
- Git, GitHub Pages
- [ ] Schema.org extension vocabularies
- linked reproduciblity edges:
similarTo
concursWith
discordantWith
intendedToReproduce
reproduces
- linked reproducibility classes and properties:
- [x] schema.org/MedicalStudy, MedicalObservationalStudy, MedicalTrial
- [ ] https://github.com/twamarc/ScheMed
- http://schema.org/MedicalTrialDesign
- http://schema.org/DoubleBlindedTrial
- http://schema.org/InternationalTrial
- http://schema.org/MultiCenterTrial
- http://schema.org/OpenTrial
- http://schema.org/PlaceboControlledTrial
- http://schema.org/RandomizedTrial
- http://schema.org/SingleBlindedTrial
- http://schema.org/SingleCenterTrial
- http://schema.org/TripleBlindedTrial
- [ ] https://github.com/twamarc/ScheMed
- See: https://westurner.github.io/opengov/us/#personal-health-agenda
- [x] schema.org/MedicalStudy, MedicalObservationalStudy, MedicalTrial
- developing vocabularies:
TODO: - pandas 3402 -
StructuredPremises: Premises as structured data¶
And then URIs for controls / study design
- see schema.org/MedicalTrialDesign
- [ ] these could/should be extended to all of science
- see schema.org/MedicalTrialDesign
logical premises (sequence of propositions)
i/o sequences
- nbformat (IPython / Jupyter notebook format)
- insufficient because we need stable premise permalinks
(across versioned publishing URIs)
#premise-1
#premise-abc398f
- insufficient because we need stable premise permalinks
(across versioned publishing URIs)
- nbformat (IPython / Jupyter notebook format)
conclusions (derivations)
- this is a computation graph
- it should have links (edges) to the datasets
- https://schema.org/Dataset
- “ENH: Linked Datasets (RDF)” https://github.com/pydata/pandas/issues/3402
- figures should have links (edges) to the datasets
- permalinks to premises
- #TenSimpleRules for Reproducibile Computational Research | http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003285 | https://wrdrd.github.io/docs/consulting/data-science#tensimplerules-for-reproducible-computational-research
further questions
- “downstream” studies / implementations
- retraction management
- decisions / policy predicated on said conclusions
LinkedMetaAnalyses¶
You evaluated 10, I evaluated (the same / a different) 10 studies
PRISMA
evaluation of controls
- “the URI says they did a Triple Blind Study, but it doesn’t sound
like they had groups named just e.g. X, Y, and Z”
- disqualified / questionable / etc
- schema.org/MedicalTrial -> schema.org/ScientificTrial
- “the URI says they did a Triple Blind Study, but it doesn’t sound
like they had groups named just e.g. X, Y, and Z”
C = Class (RDFS)
P = Property (RDFS)
-
- [ ] C: MetaAnalysis
- [ ] C: CriteriaBase type
- [ ] C: Criterion
- [ ] C: ScientificStudy
- [x] C: MedicalStudy
- [ ] C: MedicalObservationalStudy <- ScientificObservationalStudy
- [ ] C: MedicalTrial <- ScientificTrial
- [x] C: MedicalStudy
- [x] C: Dataset
- [ ] C: CriteriaBase type
- [ ] C: MetaAnalysis
When do we show?
- Deadline
- Only if you also produce your own meta-analyses
- Only if we’re doing Open Access (as required by stipulations of federal funding)
RDF Example¶
Linked Data + Reproducibility => Linked Reproducibility
Reproducibility ---\___ Linked Reproducibility
Linked Data ---/
:LinkedData rdf:type skos:Concept ;
rdfs:label "Linked Data"@en ;
schema:name "Linked Data"@en ;
owl:sameAs <https://en.wikipedia.org/wiki/Linked_data> ;
owl:sameAs <http://dbpedia.org/page/Linked_data> ;
owl:sameAs <http://ja.dbpedia.org/resource/Linked_data>
owl:sameAs <http://es.dbpedia.org/resource/Datos_enlazados> ;
owl:sameAs <http://fr.dbpedia.org/resource/Web_des_donn%C3%A9es> ;
owl:sameAs <http://nl.dbpedia.org/resource/Linked_data>
owl:sameAs <http://ko.dbpedia.org/resource/링크드_데이터> ;
owl:sameAs <http://wikidata.org/entity/Q515701> ;
.
:Reproducibility a skos:Concept ;
rdfs:label "Reproducibility"@en ;
schema:name "Reproducibility"@en ;
owl:sameAs <https://en.wikipedia.org/wiki/Reproducibility> ;
owl:sameAs <http://dbpedia.org/page/Reproducibility> ;
.
:LinkedReproducibility a skos:Concept ;
rdfs:label "Linked Reproducibility"@en ;
schema:name "Linked Reproducibility"@en ;
skos:related [ :LinkedData, :Reproducibility ] ;
.
CSV, CSVW, and metadata rows¶
- CSV – Comma Separated Values
- CSVW – CSV on the Web ( RDF, JSON, JSON-LD )
- RDF – Resource Description Framework
A classic data table with 1 metadata header row (column label):
column label | sample | date | width | height |
---|---|---|---|---|
1 | 2016-06-19T06:28:49-0500 | 20.0 | 30.0 | |
2 | 2016-06-19T06:29:22-0500 | 40.0 | 50.0 | |
3 | 2016-06-19T06:29:48-0500 | 60.0 | 70.0 |
A data table with 7 metadata header rows (column label, property URI path, DataType, unit, accuracy, precision, significant figures):
column label | sample | date | width | height |
---|---|---|---|---|
property URI path | schema.org/name | schema.org/dateCreated | [schema.org/height, schema.org/value] | [schema.org/width, schema.org/value] |
schema.org/DataType | schema.org/Integer | schema.org/Date | schema.org/Float | schema.org/Float |
Unit | unit:Meter | unit:Meter | ||
accuracy | ||||
precision | ||||
significant figures | *.1 | *.1 | ||
1 | 2016-06-19T06:28:49-0500 | 20.0 | 30.0 | |
2 | 2016-06-19T06:29:22-0500 | 40.0 | 50.0 | |
3 | 2016-06-19T06:29:48-0500 | 60.0 | 70.0 |
References¶
- TODO: @westurner
- hackernews