Structure of AFP code page resource names | Code page information

↑ Up one level

Background and overall structure

A FOCA code page (as used in AFP) is not the same thing as a CDRA code page; in CDRA, "code page" is defined as a synonym for CPGID.  A FOCA code page usually has a CPGID (although there are exceptions to this), but can also specify a GCSGID and ESID; multiple FOCA code pages, furthermore, can exist for a single CPGID.  The closest CDRA concept to the FOCA code page would be the single-component CCSID.

All FOCA code page resources for AFP have resource names beginning with "T1", marking them as code pages.  If stored in a conventional filesystem, FOCA code pages without Unicode mappings may have no filename extension (as seen with e.g. the handful of code page files in ur47923engl.zip as available in IBM's website) or the extension .CDP (as seen in e.g. udc-japan1100.zip available on IBM's website).  "Extended" code pages, meaning FOCA code pages including Unicode mappings, typically have the extension .ECP (as seen in e.g. ecp_gl.zip available on IBM's website).  Inclusion of Unicode mappings allows them to be used with more conventional, Unicode-mapped, font formats rather than only with GCGID-mapped fonts.

The resource name consists of up to eight characters and, as recommended in FOCA, is expected to consist of characters from GCSGID 961 (uppercase ISO Basic Latin alphanumerics, plus the dollar, octothorpe and at-sign); this is only slightly more restrictive than the repertoire permitted in non-initial position in z/OS data set qualifiers (which also permits the hyphen).  In practice, it normally only consists of the uppercase alphanumerics, i.e. GCSGID 1134, rarely if ever making use of the three symbols.  Since the resource name is usually included in the FOCA code page header, where it is encoded using CPGID 500 with GCSGID 103 (i.e. CCSID 8448), it will not in any case contain characters outside GCSGID 103 (the ASCII repertoire), even if it resides on a conventional filesystem and thus does not need to be usable as a data set qualifier.  When residing on a conventional filesystem, the filename may be treated as case-insensitive (as witnessable in e.g. 4000micr.zip available on IBM's website).

Single-byte code pages have resource names eight characters long, with the six characters after "T1" identifying the code page; usually the one, two or three characters following "T1" will specify some sort of category the code page falls into, and determine the interpretation of the remainder of the resource name (which is often but not necessarily a CPGID).  Double-byte code pages or portions thereof have resource names between six and eight characters long: the four characters after "T1" identify the code page (generally a single character specifying a version, followed by a three-digit CPGID), and are followed by either two hexdigits for a single "ward"/"section" of the code page (i.e. the allocations under a single lead byte; this approach is seen in e.g. S544-5850-00), the letter "U" for the entire code page including UDC (user-defined character) regions, or the empty string for the entire code page without UDC regions.

Note that code pages with resource names starting with T100, T1V1 or T1B00 are "preferred", per G544-5846-02, and the rest are legacy (not counting those used for components of multi-byte encodings, e.g. T1H or T1K; also, T1E isn't listed as either preferred or legacy, though I highly doubt them to be any more "legacy" than the corresponding T10 pages).

Specific prefixes