quinta-feira, 27 de outubro de 2011

unicode tool

O Encoding dá sempre problemas.. aqui fica um overall:

What is Unicode?

Unicode is a standard encoding system for computers to display text and symbols from all writing systems around the world. Unicode is coordinated by the Unicode Consortium. There are several Unicode encodings: the most popular is UTF-8, other examples are UTF-7 and UTF-16. UTF-8 uses a variable-length character encoding, and all basicLatin character codes are identical to ASCII. On the Unicode website you can read the following definition for Unicode: Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.More information

Converting from Latin to UTF-8 in your code

Quick Jump A-Z:

PHP

Use:utf8_decode($data)to convert from UTF-8 to ISO-8859-1 (more info)

And use:utf8_encode($data)to convert from ISO-8859-1 to UTF-8 (more info).

Some native PHP functions such as strtolower(), strtoupper() and ucfirst() do not always function correctly with UTF-8 strings. Possible solutions: convert to latin first or add the following line to your code:setlocale(LC_CTYPE, 'C');Make sure not to save your PHP files using a BOM (Byte-Order Marker) UTF-8 file marker (your browser might show these BOM characters between PHP pages on your site).

PERL

use Encode qw( from_to is_utf8 );
from_to($data, "iso-8859-1", "utf8");
You can use:is_utf8($data)to check if a string is valid UTF-8 (more info)

Python

To encode in UTF-8:source_encoding = "iso-8859-1"
string = "Names with international characters like 'Andrée'"
string = string.encode(source_encoding)
string = unicode(string, 'utf-8')
To decode back to locale character set:utf8string.encode("utf-8");

.NET C#

In C-Sharp use System.Text:ASCIIEncoding ascii = new ASCIIEncoding();
UTF8Encoding utf8 = new UTF8Encoding();
byte[] asciiBytes = Encoding.Convert(utf8, ascii, utf8bytes);

Java

String.getBytes(Charset)Use String.getBytes to convert a string (more info) or use the CharsetEncoder class.

MySQL

MySQL uses character sets on all levels, there are settings like: character_set_connection and collation_connection, and you can specify a character set at the database level, the table level and field level. To convert a character set inside a MySQL query use convert:SELECT CONVERT(latin1field USING utf8)If you are experiencing speed issues with table joins after converting character sets of tabels or fields make sure that all ID fields use the same COLLATE setting . More information.

HTML

You can specify your preferred character set using the content-type meta tag :To avoid problems with various character sets it is sometimes easier to convert your special characters to (plain ASCII) HTML code. HTML encoded special characters are also readable by old browsers, whereas the content-type meta tag is not. You can use thisspecial character to HTML code converter for this.

Unix systems

Use the character set conversion tool:iconv -f ISO-8859-1 -t UTF-8 filename.txtMore information on GNU.org

Windows systems

Most good text-editors offer Unicode support, such as UltraEdit (File → Conversions → 'ASCII to UTF-8' or 'ASCII to Unicode (16-Bit)').

Thanks to software developers who sent me corrections and updates!


Sem comentários: