Development

Status and Contribution

This is hobby project in its early phase. I am not planning to invest vast efforts here, but I am curious to get your feedback.
If you like to contribute, here’s what you can do:

  • You use this software and like it?
    Please let me know (and send your cool templates).
  • Missing some words, irregular word forms, or tags?
    Edit the word lists and send a pull request.
    NOTE: this is not about collecting as much words as possible, so do not simply dump the wordnet database here! Instead we should try to have frequently used words, with high quality tagging. Get in touch if you are in doubt.
    This little script may help to merge word-lists or tags into the existing data base.
  • Have an idea for improvement?
    Let me know, but be prepared to invest some of your own time as well.
  • Found a bug?
    Keep it, or send a pull request ;-)

Run Tests

If you plan to debug or contribute, install to run directly from the source:

$ python setup.py develop
$ python setup.py test

How to Contribute

Work in a virtual environment. I recommend to use pipenv to make this easy. Create and activate the virtual environment:

$ cd /path/fabulist
$ pipenv shell
$ pip install -r requirements-dev.txt
$ python setup.py test
$ python setup.py develop
$ python setup.py sphinx

Make a release:

$ python setup.py test
$ python setup.py bdist_wheel
$ twine upload

Data Model and File Format

Word List Entries

Word lists are represented per word type as objects (derived from the common _WordList base class).
A word list knows its CSV format and provides methods to load, save, and access data.
The main attributes are

key_list
A list of all known words in their base form (aka 'lemma').
data
A dictionary of additional data per lemma, stored as *word entry* dictionary.
tag_map
A dictionary of lemma-sets per tag.

Word entries contain information about one single word. For example a word entry for a noun may look like this:

{"lemma": "alpaca",
 "plural": "alpacas",
 "tags": {"animal"},  # A set of tag names or None
 }

Note: Nouns without plural form store "plural": False.

A word entry for a verb may look like this:

{"lemma": "strive",
 "past": "strove",
 "pp": "striven",    # past perfect form
 "s": "strives",     # -s form
 "ing": "striving",  # -ing form
 "tags": None,       # A set of tag names or None
 }

A word entry for an adjective may look like this:

{"lemma": "bad",
 "comp": "worse",       # comparative
 "super": "worst",      # superlative
 "antonym": "good",     # antonym or None
 "tags": {"negative"},  # A set of tag names or None
 }

Note: Incomparable adjectives / adverbs (e.g. ‘pregnant’) store "comp": False.

Word List Files

Word lists are provided as plain text files in CSV format:

  • File name is <word-type>_list.txt.
  • Use UTF-8 encoding.
  • Empty lines and lines starting with ‘#’ are ignored.
  • Attributes are comma separated.
  • Multi-value attributes are separated by ‘|’.
  • Attributes should be omitted if they can be generated using standard rules (e.g. plural of ‘cat’ is ‘cats’).
  • An attribute value of ‘-’ is used to prevent this value (e.g. ‘blood’ has no plural form).

Example from noun_list.txt:

# Noun list
# lemma | plural | tags
blood,-,
cat,,animal|pet
...

Lorem Ipsum Files

Blind text sources are stored as plain text files.

  • File name is lorem_<dialect>.txt.
  • Use UTF-8 encoding.
  • One sentence per line.
  • Paragraphs are separated by a line containing of three hyphens (---).

Note: Sentences and paragraphs are considered by API methods depending on the entropy argument.

Example from lorem_ipsum.txt:

# Lorem ipsum
# Opera sine nomine scripta

Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat.
Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
---
Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse
...

Name Lists

The name generator is implemented by the NameList class, which is virtual implementation that internally uses a FirstnameList and a LastnameList class. The name pools are stored in firstname_list.txt and lastname_list.txt respectively. First names also use the tags fand m to denote female and/or male gender.