Diacritic
A (sign) diacritic (of the Greek διακριτικός diacritikós , “which distinguishes”) is a sign placed on (diacritic superscribed), under (diacritic subscribes), in or through (diacritic registered), after (diacritic adscript), in front of (diacritic prescribed) or around (diacritic circumscrit) of a Graphème for:
- to modify the phonetic value of it;
- to allow a more precise reading (the diacritic ones are then not obligatory);
- or to avoid an ambiguity enters of the Homographe S.
There exist as letters diacritic, dumb women and necessarily written beside the letter as they modify. Incidentally, they could become a diacritic (cf Umlaut and Rond as a chief ).
As well as the binding S and that the additional letters invented afterwards, the addition of diacritic extends the number of graphèmes of a writing. In many cases, the diacritée letter is not regarded an independent graphème but as a allographe , i.e. another written version of the simple letter. The diacritée letter then does not intervene in the Alphabetical classification.
As example, the Acute accent of the French modifies the phonetic value of a E , generally marked ('' E '' “dumb”): E is worth then. The Grave accent on a has , however, only allows to distinguish from the homographs: the (article) ~ there (adverb of place), being worth both in French of France (but they are distinguished with the oral examination in other dialects from French). E is however not considered and with like letters independent of E and has . French does not know the diacritic one being used to refine the reading: they all are obligatory.
The Rumanian uses the diacritic ones optionellement but in an intensive way to give a more precise reading. This language does not have any accent on the letters and each phoneme corresponds to only one letter whatever the mot.
This type exists in Arab, where the vowels are not written; in the didactic or religious works, one can note them in the form of diacritic. The fatḥa , a superscribed slightly oblique feature, is used to indicate the presence of a vowel: the word عدل is read ʿadl (`adl) but transliterates ʿdl . To specify the reading of it, one can add a fatḥa : عَدل.
Each writing could develop its clean diacritic:
- Diacritic of the Latin alphabet (this one also being used for the romanisation, it is the alphabet which knows the most diacritics):
- diacritic cf also of the Vietnamese alphabet ( quốc ngữ ),
- the Diacritic used in French, their use and the rules of employment, are treated separately;
- Diacritic of the Greek alphabet;
- Diacritic of the Cyrillic alphabet;
- Diacritic of the Arabic alphabet;
- Diacritic of the Hebrew alphabet;
- Diacritic of the devanâgarî;
- Diacritic of the Japanese spelling-books;
- Diacritic of the alphabet Tibetan;
- Diacritic of the Turkish alphabet.
Transcription of diacritic in data processing
Transcriptions in ASCII
The Character set ASCII standard, tributary of the octal System very much used in the beginnings of data processing, comprises 128 codes, including 95 displayable characters, among which 52 alphabetical characters , the 26 letters of the Latin alphabet in breakages capital and tiny (or Bas-de-casse), but not of accented accentuated letter.
There exist several often named character sets ASCII wide, which count 256 codes, the 128 additional codes being in particular used to represent some Voyelle S and Consonne S of the Latin alphabet comprising of the diacritics.
The first extended character sets, said pages of code, were created by the company IBM for its Micro-ordinateur S " PC " ; in this system, a page of code or " CP" ( codepage ) is specified by a number and is associated with a particular unit: the " CP437" is the " unit; américain" or " graphique" ; the CP850 is the " unit; multilingual européen".
With the appearance of the graphical interfaces (Apple Macintosh, Microsoft Windows, X Window, etc), the “graphic” characters of the pages of code not taking place more to be a greater number of wide codes were used to note characters with diacritics. Units created jointly by IBM and the company Microsoft for their two graphic platforms, Windows and OS/2 " Manager" presentation; , were used as a basis for a series of character sets ISO, the standard ISO 8859 which is declined in fifteen units:
-
8859-1 to 8859-4, 8859-9 to 8859-10, 8859-13 to 8859-15: " Latin1" with " Latin9" , alternatives of the Latin alphabet with characters with diacritic of various countries and areas (France, Italy, Spain, Albania, Turkey, countries Scandinavian, Hungary, Poland, etc);
- 8859-6: Latin alphabets and Arab;
- 8859-7: Latin alphabets and Greek;
- 8859-8: Latin alphabets and Hebrew;
- 8859-11: alphabet inhabitant of Thailand.
When one does not have a keyboard of French computer or that an application does not support the accentuated characters, one can return these diacritic by adding a character before and/or after the letter to be accentuated. That can give for example:
-
Le garc, one could not “E `tre L `has this e' you”.
Transcriptions in unicode
The Consortium Unicode , which gathers the majority of the great names of data processing, was created in the middle of the Années 1980 to compensate for the problem of the incompatibility of various codings of natures developed for various material and software platforms (EBCDIC and system " codepage" of IBM/Microsoft, plays specific to Apple, HP, plays Unix, etc) and in connection with the development of the standard ISO 10646.
The initial goal was to develop a system of coding either on 8 bits but out of 16 bits, which allows the coding of 2^16 is 65 536 characters. Currently, the standard was extended beyond the 16 bits, because the variety of the characters and symbols (in particular mathematical and scientific symbols) to represent exceeds of much this limit, the only Chinese writing with its various alternatives exceeding this limit of 65  already; 536.
The principle selected was to group whole or subsets of characters and symbols by “pages” of 256 codes or “blocks”; for example, blocks 0 to 3 correspond to four subsets of the Latin alphabet, block 6 with the “diacritics compounds” associable with the characters of the Latin alphabet, block 7 with the Greek characters and coptes, block 11 with Hebrew, blocks 12 to 14 with the Arabic alphabets and cyriaque, block 58 with the monetary symbols, blocks 63,73,77 and 78 with the mathematical symbols, etc
In its final version the system unicode 16 bits did not retain the pictographic writings, which meet another standard.
There are three manners of inserting a “unicode” in a document:
- by value;
- by sequence number;
- by " alias".
The inscription by value consists in placing in the document the numerical sequence of 16 bits which corresponds to a given character. The methods by sequence number are used in certain types of documents only, in particular in the files of format rtf assimilated and HTML or (XML, in particular). In all the cases, the principle is the same one: to make precede or surround the number or alias of a “escape sequence”.
In documents HTML have place the sequence " &" (alias) or " &" (number) at the beginning and the sign " ; " at the end of the sequence, and between the two the sequence number or alias.
For example, the sequences
Zh-min-nan: Phiat-im hû-hō
" & #26; " and " & amp; " makes it possible to represent the sign “and commercial” (“ampersand”) => " & "
Related articles
Random links: Pelecanoididae | Mardilly | List agroalimentary AOC | Constantin Movilă | Hasnon | Histoire_de_la_Tanzanie