Nav apraksta

Getty Ritter 0266a67a69 small typo in readme 9 gadi atpakaļ
lektor-rss 802bef8ada First pass at Atom feed fetching 9 gadi atpakaļ
util 744056aa15 Added simple bash scripts 9 gadi atpakaļ
README.md 0266a67a69 small typo in readme 9 gadi atpakaļ

README.md

Lektor: A Standard for Feed Readers

At A Glance

A given user has their own lektor-dir. A lektor-dir contains both "feeds" and "entries". Two kinds of programs operate on lektor-dirs in two different capcities: a fetcher produces entries for one or more feeds, and a viewer manages entries once produced and shows them to some user. A given lektor-dir can have multiple fetchers and multiple viewers operating on it.

The rationale for these decisions is this:

  • Separating fetchers from viewers means that a user can easily mix-and-match different front-ends and back-ends.
  • Allowing multiple fetchers allows different entry sources to be handled independently, ideally allowing those programs to be simpler.
  • Allowing multiple viewers means that a user can track multiple feeds but view the information from those feeds in ways which are more or less appropriate.
  • Keeping this information split apart in the file system, rather than in a database or text file, both improves the ability to operate concurrently on different parts of a lektor-dir and lifts the burden of parsing information from the implementer. The file system is generally used here as a kind of hierarchical key-value store.
  • The overall design is lifted straight from the maildir format, which is a time-tested and well-understood format for email. This modifies it slightly and adds a richer structure for RSS-like applications.

lektor-feed

A given feed consists of at least a human-readable name and a URI id which unambiguously identifies the feed. Information about feeds is stored in the src directory inside a lektor-dir. Information about a given feed is stored inside src/$hash, where $hash is the SHA-1 hash of of the feed's id.

Obligatory elements for a feed include:

  • id: The URI which identifies the feed. In the case of RSS/Atom/ActivityStream feeds, this will generally be the URL at which the feed is hosted. For other things—for example, for services which may not have a web equivalent—it might instead be a tag URI or some other opaque identifier.
  • name: The human-readable name of the feed. This is produced by the fetcher and should not be changed by a viewer, even if a user wants to alias the name to something else.

Optional elements for a feed include:

  • description: A human-readable description describing the feed.
  • language: The language the feed is written in.
  • image: An image that can be optionally displayed with the channel.
  • copyright: The copyright notice for the feed.
  • author: Authorship information for the feed.

Feed example

A minimal feed might look like

# $HASH is sha1sum('http://example.com/rss.xml')
HASH=80af8e84e5ef7ae6b68acb8d1987e58e3e5731dd
cd $HASH

echo 'http://example.com/rss.xml'  >id
echo 'Example Feed'                >name

A feed with more entries might look like

# $HASH is sha1sum('http://example.com/rss.xml')
HASH=80af8e84e5ef7ae6b68acb8d1987e58e3e5731dd
cd $HASH

echo 'http://example.com/rss.xml'         >id
echo 'Example Feed'                       >name
echo 'An example feed.'                   >description
echo 'en-us'                              >language
echo 'http://example.com/image.png'       >image
echo 'Copyright 2015, Getty Ritter'       >copyright
echo 'Getty Ritter <gdritter@gmail.com>'  >author

lektor-entry

In contrast to maildir, entries in a lektor-dir are not files but directories adhering to a particular structure.

Obligatory elements for an entry include:

  • title: The title of the entry.
  • id: The URI which identifies the entry. This will often be a URL at which the resource corresponding to the entry is available, but may also be an opaque identifier.
  • content: Some kind of content. If no type element is present, then the content is assumed to be plain text; otherwise, the type element will dictate the format of the content.
  • feed: A directory that contains all the information about the source feed. This will generally be a soft link to the relevant feed directory, but programs should not assume that it is.

Optional elements for an entry include:

  • author: Names and email addressess of the authors of the entry.
  • pubdate: When the entry was published.
  • type: The MIME type of the content. If type is not present, the assumed content type is text/plain.

Entry example

A minimal entry might look like

# $FEED is sha1sum('http://example.com/rss.xml')
FEED=80af8e84e5ef7ae6b68acb8d1987e58e3e5731dd
echo 'Example Entry'               >title
echo 'http://example.com/example'  >id
echo 'A sample entry.'             >content
ln -s $LEKTORDIR/src/$FEED          feed

A full entry might look like

# $FEED is sha1sum('http://example.com/rss.xml')
FEED=80af8e84e5ef7ae6b68acb8d1987e58e3e5731dd
echo 'Example Entry'                         >title
echo 'http://example.com/example'            >id
echo 'A sample entry.'                       >content
echo 'Getty Ritter <gettyritter@gmail.com>'  >author
echo '2015-06-23T13:06:22Z'                  >pubdate
echo 'text/html'                             >type
ln -s $LEKTORDIR/src/$FEED                    feed

lektor-dir

A lektor-dir is a directory with at least four subdirectories: tmp, new, cur, and src. A fetcher is responsible for examining a feed and adding new entries the lektor-dir according to the following process:

  • The fetcher chdir()s to the lektor-dir directory.
  • The fetcher stat()s the name tmp/$feed/$time.$uniq.$host, where $feed is the hash of the feed's id value, $time is the number of seconds since the beginning of 1970 GMT, $uniq is a combination of unique elements possibly including the process pid or various sequence numbers, and $host is its host name.
  • If stat() returned anything other than ENOENT, the program sleeps for two seconds, updates $time, and tries the stat() again, a limited number of times.
  • The fetcher creates the directory tmp/$feed/$time.$uniq.$host.
  • The fetcher writes the entry contents (according to the lektor-entry format) to the directory.
  • The fetcher moves the file to new/$feed/$time.$uniq.$host. At that instant, the entry has been successfully created.

A viewer is responsible for displaying new feed entries to a user through some mechanism. A viewer looks through the new directory for new entries. If there is a new entry, new/$feed/$unique, the viewer may:

  • Display the contents of new/$feed/$unique.
  • Delete new/$feed/$unique.
  • Rename new/$feed/$unique to cur/$feed/$unique;$info.

A lektor-dir can contain other information not specified here, but that information should attempt to adhere to these guidelines:

  • If the extra information pertains to a particular feed, it should appear in the directory src/$feed/etc
  • If the extra information pertains to a fetcher, it should appear in the directory etc/fetch.
  • If the extra information pertains to a viewer, it should appear in the directory etc/view.

Possibilities for lektor

Lektor lends itself well to web syndication (e.g. RSS, Atom, ActivityStreams, &c) but could be used for any kind of stream of information. For example, a fetcher might serve as a mediated logging service for other information such as regular load information on a running web service, pushing updates into a shared lektor-dir on a regular basis. It would also be trivial to write custom fetchers for services that no longer expose RSS or other syndication formats, such as Twitter.

Here is a trivial fetcher that provides a feed of timestamps every hour:

#!/bin/bash -e

cd $LEKTORDIR

# the feed information
ID='tag:example.com:timekeeper'
HASH=$(printf $ID | sha1sum | awk '{ print $1; }' )

# other metadata
HOST=$(hostname)
MAX=10

# create the feed
mkdir -p src/$HASH
echo $ID         >src/$HASH/id
echo Timekeeper  >src/$HASH/name

mkdir -p "tmp/$HASH"
mkdir -p "new/$HASH"

# create entries every hour
while true; do
    TIME=$(date '+%s')
    ENTRY="$HASH/$TIME.P$$.$HOST"

    # if the file exists, wait two seconds and try again
    RETRY=0
    while [ -e $ENTRY ]
    do
        # if we've waited more than $MAX times, then
        # give up
        if [ $RETRY -gt $MAX ]; then
            exit 1
        fi
        sleep 2
        RETRY=$(expr $RETRY + 1)
    done

    # create the entry
    mkdir -p tmp/$ENTRY

    # create entry values
    echo 'Current Time'                      >tmp/$ENTRY/title
    echo $TIME                               >tmp/$ENTRY/content
    echo "tag:example.com:timekeeper#$TIME"  >tmp/$ENTRY/id
    ln -s $LEKTORDIR/src/$HASH                tmp/$ENTRY/feed

    # move the entry to the new location
    mv tmp/$ENTRY new/$ENTRY

    # wait for half an hour and do it again
    sleep 3600
done

Additionally, multiple viewers can act on the same lektor-dir. A given viewer need not show every piece of information: for example, a viewer may sniff the type attribute of entries and only display entries of a given type, or selectively choose which feeds to display, or even select entries at random to display. It also has full control over how to display those entries.

Here is a trivial viewer that shows a small digest of each entry in new and then moves those entries to cur:

#/bin/bash -e

cd $LEKTORDIR

for FEED in $(ls new)
do
	mkdir -p cur/$FEED

	# print feed header
	echo "In feed $(cat src/$FEED/name):"
	echo

	for ENTRY in $(ls new/$FEED)
	do
		# print entry
		echo "$(cat new/$FEED/$ENTRY/title)"
		cat new/$FEED/$ENTRY/content | head -n 4
		echo

		# move entry to `cur`
		mv new/$FEED/$ENTRY cur/$FEED/$ENTRY
	done
done