Skip to content

glossarist-ruby

Ruby gem implementing the Glossarist concept model in Ruby. All the entities in the concept model are available as classes and all the attributes are available as methods of those classes. The gem reads and writes Glossarist V2 and V3 datasets, packages them as portable GCR archives, and exports to TBX, SKOS, and Turtle with SHACL validation.

Current version: v2.9.1 — synced to concept-model v3.1.0.

Install

Add this line to your application's Gemfile:

ruby
gem 'glossarist'

And then execute:

bash
bundle install

Or install it yourself as:

bash
gem install glossarist

Usage

Reading a Glossarist model V2/V3 from files

A Glossarist dataset is a collection of concepts and their localizations in YAML, optionally accompanied by a register.yaml (V3 dataset metadata), bibliography.yaml, and dataset-level figures/tables/formulas.

The storage structure has 2 forms:

  1. Each concept is stored in a concept YAML file and its localized concepts are stored in separate YAML files. The concept files live in concept/ and localizations in localized_concept/.
  2. Each concept and its related localizations are stored in a single YAML file, placed directly in the specified path.
ruby
collection = Glossarist::ManagedConceptCollection.new
collection.load_from_files("path/to/glossarist-dataset")

Writing a Glossarist model to files

ruby
collection = Glossarist::ManagedConceptCollection.new
collection.load_from_files("path/to/glossarist-dataset")
# ... Update the collection ...
collection.save_to_files("path/to/glossarist-dataset")

To write with concepts and their localizations grouped into single files:

ruby
collection.save_grouped_concepts_to_files("path/to/glossarist-dataset")

GCR packages

A GCR (Glossarist Concept Resource) package is a portable ZIP archive of a dataset — concepts, register, bibliography, images, and any pre-compiled machine formats (TBX, JSON-LD, Turtle, JSONL).

ruby
collection = Glossarist::ManagedConceptCollection.new
collection.load_from_files("path/to/dataset")

# Write a GCR package
collection.save_to_gcr("my-dataset.gcr")

# Read a GCR package
collection2 = Glossarist::ManagedConceptCollection.new
collection2.load_from_gcr("my-dataset.gcr")

ManagedConcept

FieldDescription
idString identifier for the concept
uuidUUID for the concept
relatedArray of RelatedConcept
statusLifecycle status (valid, draft, submitted, superseded, retired)
datesArray of ConceptDate
localized_conceptsHash of localizations (language code → UUID)
domainsSubject area references (rendered as <domain> in TBX)
tagsFree-form organizational tags for grouping/filtering (not rendered as terminological domains)
sourcesConcept-level bibliographical sources
localizationsHash of localizations (language code → LocalizedConcept)
ruby
concept = Glossarist::ManagedConcept.new({
  "data" => {
    "id" => "123",
    "localized_concepts" => { "ara" => "<uuid>", "eng" => "<uuid>" },
    "localizations" => [...],
    "domains" => [
      Glossarist::ConceptReference.new(concept_id: "103", ref_type: "domain"),
    ],
    "tags" => ["time-scales"],
  },
})

LocalizedConcept

FieldDescription
idOptional identifier for cross-references
uuidUUID for the concept
designationsArray of Designations
domainURI reference to the subject area (URN, URL, or relative)
relatedArray of per-language RelatedConcept
subjectSubject of the term
definitionArray of DetailedDefinition
non_verb_repArray of non-verbal representation references
notesZero or more notes
examplesZero or more concept-level examples
annotationsEditorial annotations (distinct from notes)
language_codeISO-639 3-letter language code
scriptISO 15924 4-letter script code
systemISO 24229 conversion system code
entry_statusnotValid, valid, superseded, or retired
classificationpreferred, admitted, or deprecated
referencesTyped references
datesPer-language governance events
releaseRelease version
review_typeeditorial or substantive
lineage_similarityLineage similarity score

What's new in 2.8–2.9

Recent releases sync glossarist-ruby to concept-model v3.1.0:

VersionHighlights
2.9.1Concept-model vendor pin bumped to v3.1.0 tag; prefixes.ttl documented
2.9.0WS-B: per-concept SKOS export, deterministic UUIDs, SHACL gate, shared Reference protocol
2.8.18V3 ConceptDate accepts any date string per v3 schema; wired through date_accepted and to_yaml callbacks
2.8.17Scoped examples integrated with aggregations and RDF export
2.8.16Non-verbal entity refactor — images.yaml removed, NonVerbRep reshaped to match v3.1 ontology
2.8.15V3 dataset syntax: collection files wrapped under a single key
2.8.13Section cascading membership — transitive ancestor traversal
2.8.12ConceptReference id alias, 52 relationship types
2.8.11Register and Section models with hierarchical section support

Dataset model (v3)

V3 datasets are self-describing — a register.yaml at the root captures identity, URNs, owner, languages, sections, and ordering. The gem reads and writes this structure:

ruby
register = Glossarist::Register.from_yaml(File.read("datasets/vim/register.yaml"))
register.sections            # => [#<Section id="1" ...>, ...]
register.sections.first.children  # transitive section tree
register.ordering            # => "systematic"

Sections support transitive cascading membership — a concept placed in section 1.2.3 is automatically a member of 1.2 and 1.

Non-verbal entities (v3.1)

The gem models dataset-level non-verbal entities as first-class resources:

ruby
figure = Glossarist::Figure.new(
  id: "quantity-classification",
  caption: { eng: "Classification of quantities" },
  alt: { eng: "Diagram showing the hierarchy of quantity types" },
  images: [{ src: "quantity-classification.svg", format: "svg" }],
)

See Non-verbal entities for the model details.

Reference protocol (v2.9.0)

A shared Reference protocol underpins ConceptReference, RelatedConcept, DesignationRelationship, and other reference-shaped entities — eliminating duplicated logic and ensuring consistent UUID fallback behavior across all reference types.

Per-concept SKOS export + SHACL gate (WS-B)

v2.9.0 introduces per-concept SKOS export with deterministic UUID v5 identifiers (based on dataset URN + concept id) and an optional SHACL validation gate that runs before serialization. This makes round-tripping through SKOS lossless and verifiable.

See Also

An open source project of Ribose