ClueTrust KB



My shape file doesn't load with the right character set, what can I do?

The problem

Cartographica has extensive support for unicode and importing and exporting of files in other character formats. However, the one problem that it can't solve is figuring out what character set the data is encoded in.

For some file formats, there are markers that indicate the character set. However, unfortunately, these are often not used. For example, it is common to find ESRI Shapefiles that are encoded as Local OEM (LDID 57 for those who are into the dbf file format) and which have had their data encoded in just about any format.

Some formats are nicely self-describing, such as KML, but most of them are legacy formats and don't do a good job of indicating how they are encoded.

A solution

For ESRI Shapefiles, there is a reasonable work-around to force the character encoding if you are willing to look up some data and make a new file using a text editor such as TextEdit (which can be found in your Mac in the Applications folder). When reading in a Shapefile, Cartographica will look for a file in the same directory, with the same name as the file you are opening, but with the extension "cpg". So, if you are opening "shapes.shp", it will look for "shapes.cpg". The cpg file contains the identifier for the encoding of the file that will be loaded. Thus, if you need to change the encoding, you can do it manually here. There are limits to what character encodings are allowed, and below you will find a list of the ones that we support in Cartographica.

 

The file itself contains only the encoding specifier. No spaces before, no return afterwards, no nothing. You will need to put the character, exactly as they are listed below, into the "cpg" file and save it out. Then, the next time that Cartographica loads your file, it will come in with the right character sets. Note that I said the next time it loads. If you have already started modifying the layer's styles, there is no need to delete the layer and re-load it later. You can save your Cartographica document, add the "cpg" file to the directory that has your files in it, and then load the Cartographica Map Set again and it should load with the right encoding.

Here is the table of the supported character encodings:

Character EncodingPlace in fileNotes
Unicode UTF-8 UTF-8
ISO Latin 1 (8859-1) ISO 88591†
ISO Latin 2 (8859-2) ISO 88592†
ISO Latin 3 (8859-3) ISO 88593†
ISO Latin 4 (8859-4) ISO 88594†
ISO Cyrillic (8859-5) ISO 88595†
ISO Arabic (8859-6) ISO 88596†
ISO Greek (8859-7) ISO 88597†
ISO Hebrew (8859-8) ISO 88598†
ISO Turkish (8859-9) ISO 88599†
ISO Nordic (8859-10) ISO 885910†
ISO Latin 7 (8859-13) ISO 885913†
ISO Latin 9 (8859-15) ISO 885915†
ISO 2022JP Japanese ISO EUC†
Chinese Big 5 ANSI Big5†
Japanese SJIS ANSI SJIS†
ANSI numeric ANSI #†
Windows Codepage (OEM) OEM #†

† These spaces are necessary in the cpg file: between ISO, ANSI, or OEM and the number.

# The # character should be replaced with the numeric character set identifier

Internal Cartographica Use

Internally, Cartographica always uses UTF-8. If you import data and then modify it (or otherwise end up changing the file from a reference to an included file), the data will be stored in UTF-8 inside of the Cartographica Map Set document.



Related Articles

Attachments

No attachments were found.

Visitor Comments

No visitor comments posted. Post a comment

Post Comment for "My shape file doesn't load with the right character set, what can I do?"

To post a comment for this article, simply complete the form below. Fields marked with an asterisk are required.

   Name:
   Email:
* Comment:
* Enter the code below:

 

Article Details

Last Updated
20th of November, 2009

Version
1.1

Software
Cartographica

Would you like to...

Print this page  Print this page

Email this page  Email this page

Post a comment  Post a comment

 Subscribe me

Subscribe me  Add to favorites

Remove Highlighting Remove Highlighting

Edit this Article

Quick Edit

Export to PDF


User Opinions

No users have voted.

How would you rate this answer?




Thank you for rating this answer.

Continue