The Case for Synthetic Literature Standards

The Fortran Automatic Coding System — first book about a programming language — Building the standards — tools and frameworks for synthetic literature evaluation

Available Tools

While the synthetic-literature standards community is still forming, several tooling efforts are already underway. Here are the most relevant:

Provenance Trackers

Content Credentials (C2PA) — The industry-standard framework for content provenance. Originally built for photography, now extended to text. Can attach metadata about model, parameters, and generation context.

SourceNet — A provenance verification protocol for digital content. Supports chain-of-custody tracking across platforms and formats.

Metadata Schemas

Schema.org CreativeWork — Widely supported structured data format. Extensions for AI-generated content are being proposed under the generatedBy property.

JSON-LD for Synthetic Text — A working group draft proposing a lightweight schema for encoding model provenance, style tags, and disclosure levels in machine-readable form.

Quality Assessment

Perplexity tools — Standard language model evaluation metrics adapted for literary quality assessment. Tools like GPTScore and LLM-Eval provide automated scoring of coherence, fluency, and originality.

Human-in-the-loop evaluators — Platforms like Argilla and Label Studio enable curated human evaluation datasets, crucial for establishing ground truth against which automated metrics can be validated.

Frameworks in Development

Several communities are actively working on framework-level solutions:

W3C Verifiable Credentials — Working on extending the VC spec for AI-generated creative works. Expected to support model provenance and parameter disclosure as claim types.

Open Source LLM Evaluation — Initiatives like HELM (Holistic Evaluation of Language Models) and BigBench provide benchmarks that could be adapted for literary evaluation.

Proposed Quality Scoring System

We propose a lightweight scoring system modeled after existing language model benchmarks but tailored to literary quality. The system would evaluate synthetic text across the following dimensions:

Dimension	Weight	Scoring Criteria
Language Quality	25%	Grammar, vocabulary richness, syntactic variety
Narrative Coherence	20%	Logical flow, plot consistency, argument structure
Originality	15%	Novelty of ideas, avoidance of clichés, creative expression
Emotional Resonance	15%	Ability to evoke affective responses, empathy, engagement
Stylistic Awareness	15%	Genre appropriateness, voice consistency, rhetorical devices
Structural Rigor	10%	Paragraph organization, chapter balance, pacing

This scoring system is still under development. We welcome contributions from the community for validation and refinement.

Request access to experimental tooling

We are developing a prototype synthetic-literature metadata validator and a provenance verification dashboard. Access available on request for researchers and publishers.