Natural Language Generation is a broad domain with applications in
chat-bots, story generation, and data descriptions. There is a wide
spectrum of different technologies addressing parts or the whole of the
NLG process. This list aims to represent this deversity of NLG
applications and techniques by providing links to various projects, tools,
research papers, and learning materials.
Alex Context NLG Dataset
- A dataset for NLG in dialogue systems in the public transport
information domain.
Box-score data
- This dataset consists of (human-written) NBA basketball game summaries
aligned with their corresponding box- and line-scores.
E2E - This
shared task focuses on recent end-to-end (E2E), data-driven NLG methods,
which jointly learn sentence planning and surface realisation from
non-aligned data.
Neural-Wikipedian
- The repository contains the code along with the required corpora that
were used in order to build a system that “learns” how to generate
English biographies for Semantic Web triples.
WeatherGov
- Computer-generated weather forecasts from weather.gov (US public
forecast), along with corresponding weather data.
WebNLG - The enriched
version of the WebNLG - a resource for evaluating common NLG tasks,
including Discourse Ordering, Lexicalization and Referring Expression
Generation.
The Schema-Guided Dialogue Dataset
- The Schema-Guided Dialogue (SGD) dataset consists of over 20k
annotated multi-domain, task-oriented conversations between a human and
a virtual assistant.
The Wikipedia company corpus
- Company descriptions collected from Wikipedia. The dataset contains
semantic representations, short, and long descriptions for 51K companies
in English.
YelpNLG - YelpNLG
provides resources for natural language generation of restaurant
reviews.
Dialog
Chatito - Generate
datasets for AI chatbots, NLP tasks, named entity recognition or text
classification models using a simple DSL!
NNDIAL - NNDial is an
open source toolkit for building end-to-end trainable task-oriented
dialogue models.
Plato
- This is the Plato Research Dialogue System, a flexible platform for
developing conversational AI agents.
RNNLG - RNNLG is an open
source benchmark toolkit for Natural Language Generation (NLG) in spoken
dialogue system application domains.
TGen - Statistical NLG
for spoken dialogue systems.
Accelerated Text
- Automatically generate multiple natural language descriptions of your
data varying in wording and structure.
RosaeNLG - An open-source library for
node.js or client side (browser) execution, based on the Pug template
engine, to generate texts in English, French, German and Italian.
Twine - An open-source tool for
telling interactive, nonlinear stories.
Realizers
Genl - Surface realiser
(part of a Natural Language Generation system) using Tree Adjoining
Grammar.
JSrealB - A
JavaScript bilingual text realizer for web development.
SimpleNLG - Java
API for Natural Language Generation.
StringTemplate - Java
template engine (with ports for C##, Objective-C, JavaScript, Scala) for
generating source code, web pages, emails, or any other formatted text
output.