Code page

src: thumbs.dreamstime.com

In computing, code page is a table of values â€‹â€‹that describes the set of characters used to encode certain character sets, usually combined with a number of control characters.

The term "code page" comes from the IBM-based mainframe system EBCDIC, but Microsoft, SAP, and Oracle Corporation are some of the vendors that use this term. The majority of vendors identify their own character sets by name. In the case when there are a number of character sets (like in IBM), identifying character sets through numbers is an easy way to tell the difference. Initially, the code page number refers to the page number in the IBM standard character set manual, a condition that does not last long. Vendors that use code page systems allocate their own code page numbers to character encodings, even though they are better known by other names; for example, UTF-8 has assigned a 1208 page number at IBM, 65001 at Microsoft, and 4110 in SAP.

Hewlett-Packard uses a similar concept in the HP-UX operating system and the Printer Command Language (PCL) protocol for the printer (whether for HP printers or not). Terminology, however, is different: What others call character set , HP calls a set of symbols , and what IBM or Microsoft calls code page , HP calls the symbol code set . HP developed a set of symbols, each with a corresponding symbol code set, to encode both the character sets themselves and other vendor character sets.

The number of character sets leads many vendors to recommend Unicode.

Video Code page

The code page numbering system

IBM introduced the concept of systematic designation of a small but globally unique, 16 bit quantity for each character encoding that may be encountered by a computer system or a collection of computer systems. The origin of the IBM numbering scheme is reflected in the fact that the smallest number (first) is set for IBM's EBCDIC encoding variation and slightly larger numbers refer to the extended IBM ASCII encoding variation as used in its PC hardware.

With the almost identical version of PC DOS version 3.3 (and MS-DOS 3.3) IBM introduced a code page numbering system for regular PC users, such as code page numbers (and "code page" phrases) used in new commands to allow character encoding used by all parts of the OS to be systematically arranged.

After IBM and Microsoft quit working together in the 1990s, the two companies have maintained a list of code page numbers assigned independently of each other, resulting in several conflicting tasks. At least one third-party vendor (Oracle) also has a list of different numerical tasks. IBM tasks are currently listed in their CCSID storage, while Microsoft tasks are documented in MSDN. In addition, the list of names and estimates of the abbreviation IANA (Internet Assigned Numbers Authority) for code pages that are installed on certain Windows machines can be found in the Registry on the machine (this information is used by Microsoft programs such as Internet Explorer).

The most famous code pages, excluding ones for CJK and Vietnamese languages, match all their code-points into eight bits and do not involve anything more than mapping any code-point to a single character; Furthermore, techniques such as combining characters, complex scripts, etc., are not involved.

Standard PC graphics hardware (VGA-compatible) hardware text mode is built using an 8-bit code page, although it is possible to use two at once with some color depth sacrifices, and up to eight can be stored in the display adapter for easy switching. There is a choice of third-party code font code that can be loaded onto the hardware. However, it is now common for operating system vendors to provide their own character encoding and rendering system that runs in graphical mode and bypasses these hardware limits completely. However a system that refers to character encoding by code page numbers remains valid, as an efficient alternative to string identifiers as defined by IETF and IANA for use in various protocols such as e-mail and web pages.

Relationship with ASCII

Most code pages in current usage are ASCII supersets, 7-bit code representing 128 control codes and printable characters. In the past, 8-bit implementation of ASCII code set the top bit to zero or use it as parity bit in network data transmission. When the top bits are made available to represent the character data, a total of 256 characters and the control code can be represented. Most vendors (including IBM) use this extended range to encode characters used by multiple languages â€‹â€‹and graphic elements that allow primitive graphical impersonation on text output devices only. There is no formal standard for 'extended character sets' and vendors refer to variants as code pages, as IBM always does for the EBCDIC encoding variant.

Relationship with Unicode

Unicode is an attempt to insert all characters from the previous code page into a single character enumeration that can be used with any number of coding schemes. In the process, duplicate characters are omitted and new variants are introduced, such as full-width ASCII. While consistent use of a Unicode encoding will theoretically eliminate the need to track different code pages or character encodings, the existence of some Unicode coding as well as the need to remain compatible with existing documents and systems that use the old encoding anyway. In practice, Unicode character encodings are only given their own code page numbers, and all other code pages are technically redefined as encodings for various subsets of Unicode.

Maps Code page

IBM code page

EBCDIC-based code page

This code page is used by IBM in the EBCDIC character set for mainframe computers.

DOS code page

This code page is used by IBM in the PC DOS operating system. This code page was originally embedded directly in the text mode hardware of the graphics adapter used with the IBM PC and its clones, including the original MDA and CGA adapter whose character set can only be changed by replacing the physically-loaded ROM chip containing the font. The interface of the adapter (which is imitated by all adapters later such as VGA) is usually limited to a single byte character set with only 256 characters in each font/encoding (although VGA adds partial support for slightly larger character sets).

When dealing with older hardware, protocols and file formats, it is often necessary to support this code page, but newer encoding systems, especially Unicode, are driven for new designs.

DOS code pages are usually stored in a.CPI file.

IBM AIX code page

This code page is used by IBM in its AIX operating system. They mimic some character sets, which are designed to be used in accordance with ISO, such as UNIX-like operating systems.

Page code 819 is identical to Latin-1, ISO/IEC 8859-1, and with slightly modified commands, allowing MS-DOS machines to use the encoding. It was used with an IBM AS/400 minicomputer.

IBM OS/2 code pages

This code page is used by IBM in its OS/2 operating system.

1004 - Latin-1 Extended, Desk Top Publishing/Windows

Windows emulation code page

This code page is used by IBM when it mimics the Microsoft Windows character set. Most of this code page has the same number as the Microsoft code page, though both are not exactly exactly . Some code pages, though, new from IBM, are not designed by Microsoft.

emintosh emulation code page

This code page is used by IBM when it mimics the Apple Macintosh character set.

Adobe emulation code page

This code page is used by IBM when it mimics a collection of Adobe characters.

HP emulation code page

This code page is used by IBM when it mimics the HP character set.

DEC emulation code page

This code page is used by IBM when imitating the DEC character set.

IBM Unicode code page

src: i.ytimg.com

Microsoft code page

Windows code page

This code page is used by Microsoft in its own Windows operating system. Microsoft defines a number of code pages known as the ANSI code page (as the first, 1252 is based on an apocryphate ANSI draft of what became ISO 8859-1). The 1252 code page is built in ISO 8859-1 but uses the 0x80-0x9F range for additional characters that can be printed rather than the C1 control code used in ISO-8859-1. Some others are partly based on other parts of ISO 8859 but are often rearranged to get them closer to 1252.

Microsoft recommends new applications using UTF-8 or UCS-2/UTF-16 instead of this code page.

DBCS code page

This code page represents the DBCS character encoding for various CJK languages. In the Microsoft operating system, this is used as both the "OEM" and "Windows" code pages for the applicable locales.

MS-DOS code page

This code page is used by Microsoft in the MS-DOS operating system. Microsoft refers to this as an OEM code page because they are defined by OEMs that are licensed MS-DOS for distribution with their hardware, not by Microsoft or any standards organization. Most of this code page has the same number as the equivalent IBM code page, though not exactly exactly . There is a minimum difference in some code pages from IBM and Microsoft.

emintosh emulation code page

This code page is used by Microsoft when it mimics the Apple Macintosh character set.

Various other Microsoft code pages

The following code page is specific to Microsoft Windows. IBM may use different numbers for this code page. They mimic some character sets, which are designed to be used in accordance with ISO, such as UNIX-like operating systems.

Microsoft Unicode code page

src: i.stack.imgur.com

Set HP Symbol

HP developed a series of Symbol Sets (each with Symbol Set Code) to encode either its own character set or other vendor character set. They are usually a 7-bit character set which, when moving to a higher section and associated with the ASCII character set, creates an 8-bit character set.

Set your own HP Symbol

Collection of Symbols from other vendors

src: i.ytimg.com

Code pages from other vendors

This code page is an independent assignment by a third-party vendor. Since the original IBM PC's code page (number 437) was not really designed for international use, certain partial country or region variations of compatible parts appeared.

The code page number of this code is unofficial not by IBM, not by Microsoft and almost nothing is called a usable character set by IANA. The numbers assigned to this code page are arbitrary and may conflict with the registered numbers used by IBM or Microsoft. Some of them may precede the replacement of the codepage added in DOS 3.3.

src: thumbs.dreamstime.com

Code page task list

List of known (incomplete) code page tasks:

src: media.idownloadblog.com

Criticism

Many old character encodings (unlike Unicode) suffer some problems. Some code page vendors do not adequately document the meaning of all code point values, which degrade the reliability of handling textual data over a variety of computer systems on a consistent basis. Some vendors add exclusive extensions to multiple code pages to add or change the value of certain code points; for example, a 0x5C byte in Shift JIS can represent a forward slash or yen currency symbol depending on the platform. Finally, to support multiple languages â€‹â€‹in programs that do not use Unicode, the code page used for each string/document should be stored.

Due to the extensive Unicode documentation, the various characters and policies of character stability, the issues listed above are rarely a concern for Unicode. The application can also incorrectly label text in Windows-1252 as ISO-8859-1. Fortunately, the only difference between this code page is that the point value code used by ISO-8859-1 for control characters is instead used as additional characters that can be printed in Windows-1252. Because control characters have no functionality in HTML, web browsers tend to use Windows-1252 instead of ISO-8859-1. In HTML5, treat ISO-8859-1 as Windows-1252 even codified as standard. Later, UTF-8 has managed to both encoding in terms of popularity on the Internet.

src: i.ytimg.com

Personal code code

When, early in the personal computer's history, users did not find their character encoding requirements met, personal or local code pages created using the Resident Stop or Stay utility or by reprogramming the BIOS EPROM. In some cases, unofficial code page numbers are created (eg CP895).

When more diverse character set support is available most pages of that code become unused, with some exceptions such as KamenickÃƒÆ'Ã‚Â½ or KEYBCS2 coding for Czech and Slovak letters. Another set of characters is the Iranian System encryption standard made by the Iranian System Company for Persian support. This standard is used in Iran in DOS-based programs and after the introduction of the Microsoft code page 1256 this standard became obsolete. However some Windows and DOS programs that use this encoding are still in use and some Windows fonts with this encoding exist.

To solve this problem, IBM Character Data Representation Architecture level 2 specifically has a range of page code IDs for user-defined tasks and personal usage. Whenever the ID of the code page is used, the user should not assume that the same functions and views may be reproduced in other system configurations or on other devices or systems unless the user handles this specifically. The code page range 57344-61439 ( E000h - EFFFh ) is officially reserved for user-specified code pages (or in fact CCSIDs in the context of the IBM CDRA), while the range 65280-65533 ( FF00h - FFFDh ) is reserved for each "user-defined" user assignment assigned. For example, a special variant of the 437 page code ( 1B5h ) or 28591 ( 6FAF ) can be 57781 ( E1B5h ) or 61359 ( EFAFh ), respectively, to avoid potential conflicts with other tasks and maintain internal numerical logic that is sometimes present in the original code page assignment. Unregistered personal code pages are not based on existing code pages, device-specific code pages such as printer fonts, which only require a logical handle to be addressed for systems, frequently changing download fonts, or code page numbers with meaningful symbolic in the local environment can have tasks in personal reach such as 65280 ( FF00h ).

The code page ID 0, 65534 ( FFFEh ) and 65535 ( FFFFh ) are reserved for internal use by operating systems such as DOS and should not be assigned to specific code pages.

src: i.imgur.com

References

src: static.miraheze.org

External links

IBM CDRA glossary
IBM code page
IBM code page by encoding the schema
IBM/ICU Charset Information
Microsoft Code Page Identifier (Microsoft list contains only code pages that are used by normal apps in Windows.See also a list of Torsten Mohrin for a complete list of supported code pages)
A shorter Microsoft listing containing only ANSI and OEM code pages but with links to more details on each
Character Set And Code Page On Hitting Button
Microsoft Chcp Command: Displays and sets the console's active code page

Source of the article : Wikipedia

Kamis, 14 Juni 2018

Code page

Video Code page

The code page numbering system

Relationship with ASCII

Relationship with Unicode

Maps Code page

IBM code page

EBCDIC-based code page

DOS code page

IBM AIX code page

IBM OS/2 code pages

Windows emulation code page

emintosh emulation code page

Adobe emulation code page

HP emulation code page

DEC emulation code page

IBM Unicode code page

Microsoft code page

Windows code page

DBCS code page

MS-DOS code page

emintosh emulation code page

Various other Microsoft code pages

Microsoft Unicode code page

Set HP Symbol

Set your own HP Symbol

Collection of Symbols from other vendors

Code pages from other vendors

Code page task list

Criticism

Personal code code

See also

References

External links

Share this