@tchrist I appreciate the feedback. Of course, it would be more efficient to stream the file. However, the original question specifically asked about scanning a small file. Your comment about the regex is incorrect, simply due to the fact that I actually tested my code before I posted it. Sorry if my range is incorrect - that might be a valid comment, but your comment is unnecessarily aggressive and rude. I was simply providing a working Groovy-based example, since the question mentioned it.
– Sep 1 '11 at 3:36.
To show non-ASCII characters in Studio editor, you need to do this. Add the text file in a project. You can drag and drop. In the project Properties dialog (select 'Properties.' In context menu for the project), select 'Resource', then change the 'Text file encoding' option in the right pane to 'UTF-8'. Open that file. As for detecting non-ASCII in an 8-bit file its really quite simple. ASCII by definition is a 7-bit encoding. It only goes from 0 to 127. Anything with the 8th bit set is by non-ASCII. As for 16 or 32-bit Unicode, just read two or four bytes at a time. If the value of that variable is greater than 127, it's not ASCII.
What's the best ways to enter special Unicode characters into a Notepad document? Do I have to rely on the operating system (Windows)? Looking for a see-and-click solution. I can bring up the ASCII Insertion Panel with Edit Character Panel — that works great — but it only has the 256 raw characters.
In UTF-8 mode I'd like to have a similar feature for the full encoded set, e.g. The Greek alphabet, math symbols, etc. I don't want to have to use the. Similar question on but no answer there either. Set up a User Defined Command:. Select Run then Run.
Enter charmap. Click Save. Enter a name to identify it, e.g. &charmap (the ampersand lets you specify C as the accelerator key, so Alt+ R then C activate it, unless you have another command with the same accelerator key). Optionally specify a keyboard shortcut to trigger it, e.g.
ALT and Num +. Click OK Now whenever you want to enter a character, either use the Run menu or shortcut/accelerator key to open Windows Character Map, and either pick one or more characters or search for it/them using its Unicode name in 'Search for:', copy to the clipboard, close Character Map and paste in Notepad. Note that you have to be in a unicode format for the characters to display in Notepad. This is done by going to the Format menu and selecting 'Encode in UTF-8' or similar.
Now you can paste in Greek letters, subscripts, etc. Into Notepad. If you can write using the numeric keypad, that means you already have the code point for the character.
Simply enter the hex value of the character(s) you want into Notepad, select them, then open the menu: Plugins → Converter → HEX - ASCII Demonstration: You can also get live results from the conversion panel Another way is using the HexEditor plugin, which was included in prior versions of Notepad but was. You can still install the plugin (or reactivate it if it was disabled) from the plugin manager with the caveat that sometimes it may be unstable. This way you just select Plugins → Hex-Editor → View in HEX (or click the H button near the right of the menu bar) then type the UTF-8 bytes into the hex edit window. For example to get the string ???? which is f0 9f 94 99 f0 9f 94 9a f0 9f 94 9b f0 9f 94 9c in UTF-8, just type those hex values in to dump column and switch back to normal text mode; you'll see those characters appear. Note that it might be easier to work in UTF-16 or UTF-32 since getting UTF-8 encoding of a character manually is tricky. After all, just convert the file back to UTF-8 when saving It's also possible to use the Base64 converter for this purpose.
Just select the base64 encoded string and select Plugins → MIME Tools → Base64 Decode If you just want to enter a few special characters frequently, it's better to use a macro. First you need to get the base64 encoding of the string by pasting it to Notepad and then use the Base64 Encode feature. After that select Macro → Start recording, type the base64 string you get, select it and decode it as above. Now you can stop recording and save the macro with some descriptive name and possibly along with a shortcut. It's also possible to use the HEX → ASCII feature instead of base64 Later when you want to insert that string, just replay the macro Edit: The Conversion Panel works with Unicode only if you paste a Unicode character directly in the ASCII field.
Responding to the first and third sections, although the question specifically asks for a see-and-click solution, which is far more convenient than having to know the code point, these are very interesting alternatives to inserting a Unicode character. The first seems more accessible, and would be improved with step-by-step instructions. To insert an em—dash, (1) Find the UTF-8 encoding somehow (link?) (2) Type E28094 (3) Plugins Converter HEX - ASCII. Non-sequitur: this is a great way to go the other way, to determine the UTF-8 coding for a character I can cut and paste.