FAQ0015EN: Malsamoj inter versioj

El UEA-vikio
Iri al: navigado, serĉi
 
e (1 versio)
(Neniu diferenco)

Kiel registrite je 10:17, 27 Jun. 2010

How can I represent these characters in E-mail or on Usenet?

Accented characters are not included in standard, 7-bit ASCII. Since only 7-bit ASCII can be reliably transmitted over the net, this leads to problems when trying to use Esperanto in E-mail and Usenet news. These problems are not unique to Esperanto; all languages with accents have them. <P>Two approaches are possible: using ASCII to represent the accented characters, or using 8-bit codes and sending them somehow over the net.


Using Standard ASCII

There are two major work-arounds to represent Esperanto's accented letters using standard 7-bit ASCII: using the letter "h" to represent the circumflex, and using the letter "x" to represent all accents. Esperanto letter: ĉ ĝ ĥ ĵ ŝ ŭ "h" method: ch gh hh jh sh u "x" method: ĉx ĝx ĥx ĵx ŝx ux <P>The "h" method is canonical in Esperanto since the Fundamento de Esperanto, which forms the basis of the language, expressly provides for it. Note that "u with breve" is represented by "u" alone, not "uh". <P>The "x" method is a recent coinage and first appeared among computer users; it is used only on the Net. <P>The following arguments are made in favour of the "x" method:

  • The "h" method is ambiguous. Is the letter "h" really supposed to be there, or is it supposed to represent an accent? The letter "x" doesn't exist in Esperanto, so there is no ambiguity: any "x" in an Esperanto text must represent an accent. Rebuttal: This kind of confusion never happens in practice. "Flughaveno" can only be the Esperanto word for "airport", since "flug^aveno" isn't a word.<P>
  • The "x" method is more suitable for machine treatment of text (sorting, indexing, etc.). In Esperanto, letters with accents are different from letters without accents: the alphabet is A, B, C, C^, D, etc. Since "x" is very close to the end of the alphabet, sorting algorithms will almost always put the accented letters in their proper alphabetical order. Rebuttal: These are highly specialized needs. People who must make their texts machine-treatable can use whatever method suits their requirements, but this is irrelevant for the vast majority of Esperanto speakers.<P>

<P>The "x" method was very popular in the early years of the net, but the "h" method has clearly been gaining ground recently, as more "ordinary" Esperantists (as opposed to professional computer users, etc.) have started using the net. Either method may be used with confidence.

<P>The "x" method is perhaps more suitable for beginners, since it removes all ambiguity, so that a beginner won't try to look up "fluĝaveno" in the dictionary.

<P>Other methods are also used, such as typing a circumflex accent (^) before or after the accented letter, but these are rarer.

<P>These work-arounds should only be used when one is restricted to 7-bit ASCII. It is wrong to use them when the real characters are available. All word processing programs can handle the accented letters correctly; most typewriters (especially electronic typewriters) can also do so. It is also wrong to use these work-arounds when hand-writing.


Using 8-bit Codes

Esperanto is covered by the 8-bit encoding known as [http://czyborra.com/charsets/iso8859.html Latin-3] (ISO 8859-3:1988). Since 8-bit codes usually cannot be reliably transmitted over the net, some "data massaging" is necessary. <P>For E-mail, a standard known as MIME (Multi-Purpose Internet Mail Extension) converts 8-bit characters to 7-bit ASCII for transmission, and converts the message back to 8 bits upon reception. Many E-mail programs can do this conversion automatically; however, users with shell accounts (especially students) often cannot see MIME messages properly. For this reason, one should ensure that the recipient's system supports MIME before sending messages in this format. <P>The use of MIME in Usenet is neither specifically permitted nor expressly prohibited. Most newsreaders can't handle postings in MIME, so it is best not to use it in Usenet. <P>Some users post messages in soc.culture.esperanto and other Usenet groups using "raw" Latin-3 codes, without attempting to "protect" them with a 7-bit encoding. This has lead to some heated discussions between those who say that they can receive the original 8-bit Latin-3 codes, and those who say that they often (or always) receive gibberish. <P>Even if the codes are transmitted properly, they can only be viewed as Esperanto characters if a Latin-3 font is used; users whose language requires the use of an incompatible 8-bit font (e.g. Russian and Japanese) will have problems viewing these characters in any event. <P>Esperanto's accented characters are covered by the incipient "wide character" standard [http://www.unicode.org/ Unicode] (ISO 10646-1:1993), so these problems will be solved if and when Unicode is widely adopted and implemented. Unicode is a widely endorsed 16-bit character code covering all languages, including non-alphabetic languages such as Chinese and Japanese.


Recommendations

<P>For everyday use, it is probably best to use either the "h" method or the "x" method, both for E-mail and for Usenet news. These methods are widely used and recognized, and both work well in practice.

<P>If one is sure that the recipient can handle MIME messages, then this format can be used for E-mail.

<P>No satisfactory 8-bit solution exists today for Usenet. Either the "h" method or the "x" method should be used for Usenet news.