Ever have a troublesome file containing unprintable characters or other character encoding problems? Ever wonder if your line delimiter (EOL) is correct for your platform? Over the years I've seen many incarnations of such problems, hidden deeply within XML and other text files. Due to the nature of such characters, being unprintable and sometimes occupying 0 space, these problems can be hard to find and diagnose -- until now. I've created a tiny utility for Eclipse that provides a way to see the invisible: Text Inspection is a new project specifically designed to aid in looking at text in Eclipse.
Text Inspection does this by providing a view that displays the current workbench text selection, exposing unprintable characters as a Java String literal using unicode escape sequences where appropriate. All you have to do to see the text of interest is open your file in a text editor (any text editor will do) and select the area of interest.

The project also augments the Java source editor with a paste command alternative for pasting text as a Java string literal, escaping characters as needed. This can come in really handy when writing unit tests. My unit tests often compare output to an expected value, which is often copy/pasted from the console.

I invite you to try it out and provide feedback.
2 comments:
It's always depressed me that despite being built by a large set of international committers, and despite working on different operating systems, that we continue to let the default encoding for files be whatever the current location/operating system thinks the default file encoding should be.
Pretty much all Java runtimes, and most operating systems since the introduction of Java itself, have been able to natively open and edit UTF-8 files. Yet we still persist in dealing with legacy systems like MacRoman, CodePage 1251 and other non-portable sets, which makes sharing data between others that much harder. (It was further not helped by Sun's decision to encode .properties files in legacy format with Java escaped string literals ...)
Anyway, whilst these types of utilities are good, it's really a symptom of an underlying problem that has never been fixed. Unfortunately, we are no closer to making that happen than we were in the 2.x days.
@AlBlue I agree completely. I assume that it's for better compatibility with external tools, or for historical reasons.
In our projects at MAKE Technologies we install tooling that creates error markers for any project that does not have the encoding set to UTF-8 (either via project-specific settings or via the workspace defaults). This prevents anyone on the team from having socially incorrect settings.
Post a Comment