Tuesday, April 29, 2008

Authoring Cross-Platform Wiki Markup

I'll never think of line delimiters the same way again. Mac uses \r, Unix uses \n, and Windows uses \r\n. Who cares? Well, when it comes to writing platform-independent Wiki markup, it matters.


Take for example a document that was written on a Unix platform. Every line is separated by a single \n character. For Wiki markup, paragraphs are separated by an empty line, which is represented in the document as two consecutive newlines \n\n. When a Wiki markup parser converts this document to HTML, it looks for the empty newline (the second \n) and uses that to close the previous paragraph and start a new one.


Now what happens if the document is opened and edited on a Mac? Mac uses \r as a newline. Suppose the Mac user adds a new empty line just before an existing one in this same document. For example:


Prior to editing:


some text\nmore text


After editing:

some text\r\nmore text


Prior to adding the newline, the document contains a single \n character, and afterwards the document contains \r\n. In the users editor (on a Mac) the \r\n will appear visually as two lines, however \r\n happens to be a Windows line delimiter and thus will be parsed by Windows-accomodating Wiki markup parsers as a single line delimiter. In many documents this may not matter, however in Wiki markup this can make a big difference.


Editors that support editing Wiki markup on multiple platforms must be coded carefully to avoid this issue. For Textile-J this means that the end-of-line markers are converted to the platform default when the editor first opens a file.


Who knew that line delimiters could be so important.

2 comments:

Bart Schuller said...

The Mac is Unix nowadays (for 7-10 years), including the line delimiter \n. I hope Java didn't get this as wrong as it does the platform encoding, which is set to MacRoman even though the Mac uses UTF-8.

Not that this takes anything away from your post, the problems you describe are real.

David Green said...

@Bart Schuller thanks for the comment.

The Java VM sets the line.delimiter system property to '\n'. The issue is an SWT issue bug 213046