Last updated 2015-05-20.

XARK is an extensible, open platform for archiving and sharing genealogical or historical data. Its goal is to create a standard to supplant GEDCOM without some of the nomenclature issues and limitations of GEDCOM X, and without the extreme complexity of alternatives like STEMMA.

As an amateur genealogist and a professional software developer, I see a need for a replacement for the existing standards that supports a more rigorous research process. I joined the discussions at FHISO, but its free-wheeling discussion groups don't really offer a proper forum for putting together a cogent collection of ideas. So, this project is a distillation of my ideas, a way to keep track of where I hope groups like FHISO and FamilySearch will go with their own efforts.

XARK is XML-based. Serializations in JSON, GEDCOM, or other formats may be possible, but for the sake of this document, all concepts will be portrayed in XML, or in some cases, in RDF, which is XML-based but more restrictive in form. It is important to note that the goal of the XML is to elegantly represent the data model, not to maximize performance. It is assumed that any software implementing this standard will actually use relational tables, XML indexing, or other mechanisms to retrieve and query data more efficiently.

XARK is absolutely not interested in forward compatibility to extant standards like GEDCOM. If the concepts align enough for conversion back to an earlier or alternative format, great, but XARK's concepts will not be limited to those that fit well with legacy formats.

XARK is made up of three separate but connected modules:

  1. Citations. Documenting the specific location where data may be found.
  2. Records. Structured transcriptions of data from a source, with little in the way of interpretation.
  3. Research. Linking records together to form an interpreted picture of a person, event, place, etc.


A citation is a collection of data about a source. A source, in turn, is a place where a given set of data may be found. A citation's elements may include any of the following:

What's with the name?

I came up with the name when Sam Ruby et al were seeking names for the standard to replace RSS. When I suggested the name, I reserved the domain just in case. The name "Atom" was chosen ultimately for that project, but I kept the domain and have been looking for a good use for it. This new project seemed to be an excellent fit.

Why does this web page suck?

Because I'm typing it freehand at 2:30am in a text editor. Don't have time for a fancy design.


Richard Tallent
Twitter and just about everywhere else: richardtallent