Structure of AFP code page resource names | Code page information
↑ Up one level
Background and overall structure
A FOCA code page (as used in AFP) is not the same thing as a CDRA code page; in CDRA, "code page" is defined as a synonym for CPGID. A FOCA code page usually has a CPGID (although there are exceptions to this), but can also specify a GCSGID and ESID; multiple FOCA code pages, furthermore, can exist for a single CPGID. The closest CDRA concept to the FOCA code page would be the single-component CCSID.
All FOCA code page resources for AFP have resource names beginning with "T1", marking them as code pages. If stored in a conventional filesystem, FOCA code pages without Unicode mappings may have no filename extension (as seen with e.g. the handful of code page files in ur47923engl.zip
as available in IBM's website) or the extension .CDP
(as seen in e.g. udc-japan1100.zip
available on IBM's website). "Extended" code pages, meaning FOCA code pages including Unicode mappings, typically have the extension .ECP
(as seen in e.g. ecp_gl.zip
available on IBM's website). Inclusion of Unicode mappings allows them to be used with more conventional, Unicode-mapped, font formats rather than only with GCGID-mapped fonts.
The resource name consists of up to eight characters and, as recommended in FOCA, is expected to consist of characters from GCSGID 961 (uppercase ISO Basic Latin alphanumerics, plus the dollar, octothorpe and at-sign); this is only slightly more restrictive than the repertoire permitted in non-initial position in z/OS data set qualifiers (which also permits the hyphen). In practice, it normally only consists of the uppercase alphanumerics, i.e. GCSGID 1134, rarely if ever making use of the three symbols. Since the resource name is usually included in the FOCA code page header, where it is encoded using CPGID 500 with GCSGID 103 (i.e. CCSID 8448), it will not in any case contain characters outside GCSGID 103 (the ASCII repertoire), even if it resides on a conventional filesystem and thus does not need to be usable as a data set qualifier. When residing on a conventional filesystem, the filename may be treated as case-insensitive (as witnessable in e.g. 4000micr.zip
available on IBM's website).
Single-byte code pages have resource names eight characters long, with the six characters after "T1" identifying the code page; usually the one, two or three characters following "T1" will specify some sort of category the code page falls into, and determine the interpretation of the remainder of the resource name (which is often but not necessarily a CPGID). Double-byte code pages or portions thereof have resource names between six and eight characters long: the four characters after "T1" identify the code page (generally a single character specifying a version, followed by a three-digit CPGID), and are followed by either two hexdigits for a single "ward"/"section" of the code page (i.e. the allocations under a single lead byte; this approach is seen in e.g. S544-5850-00), the letter "U" for the entire code page including UDC (user-defined character) regions, or the empty string for the entire code page without UDC regions.
Note that code pages with resource names starting with T100, T1V1 or T1B00 are "preferred", per G544-5846-02, and the rest are legacy (not counting those used for components of multi-byte encodings, e.g. T1H or T1K; also, T1E isn't listed as either preferred or legacy, though I highly doubt them to be any more "legacy" than the corresponding T10 pages).
Specific prefixes
- T10 is simply followed by a CPGID, and gives no comment about the particular version. Note that Euro Sign CECPs (ECECPs) are under here, not under T1V1 (since there was never a "version 0" of the ECECP, only of the CECP), nor under T1E (since they aren't upward-compatible extensions under the same CPGID, but replace a character with the Euro sign and get a different CPGID, with the latter CPGID being in the resource name).
- T1B follows the EBCDIC CPGID given in the resource name (the CECP version if applicable) for the graphical region (0x41–FE), and adds BookMaster control-code-region graphics in the control code region (0x00–3F); this is assigned its own separate CPGID (which is not given in the resource name). T1B00BGS (CCSID 396) is an exception, and is the "BookMaster Specials" set, which doesn't replace any control codes, which seems to be a (lowercase-S) symbol-font encoding (see also remarks on T1GI0396 below).
- T1D denotes a code page related to DCF (IBM Document Composition Facility). Rather than a CPGID, the resource name will end with either "GP12" ("Gothic Tri-Pitch"), "CDCFS" ("US Text Subset"), or "BASE" ("Migration", i.e. a deprecated code page used in a particular region). The character after the D (but before "BASE"), if not 0, denotes a variant for a locale, e.g. DCF Migration for United Kingdom is T1DUBASE (CCSID 2116).
- T1E means an upward-compatible extension adding a euro sign. The CPGID is as given in the remainder of the resource name, but this is a later expansion of the CPGID compared to the version represented by the corresponding T10 code page (and probably has a different CCSID).
- T1GDP is used for pre-1986 "version 0" CECPs (see S544-4312-07, although note that a digit 1 has been substituted for the dotless lowercase I there; the GCGIDs shown are correct); the letters "GDP" denote "data processing" according to GA32-1048-30. Relatively few collections/documents include/list them (though e.g. 5770-SS1 lists them). The CPGID stated in the resource name is generally one with a DP94 form, which may or may not have a 1986 CECP form as its eponymous CCSID. Some exceptional cases are worthy of note:
- Code page 1081 is T1GDP279 after the DP94, not 297 like the 1986 CECP (T1V10297); compared to code page 279, T1GDP279 and T1V10297 replace the backtick with a micro sign, due to the 1982 edition of NF Z 62010 making that change to the homologous ISO 646 variant.
- T1GDP256 (code page 256) has no "version 1" (since that would be identical to the version 1 of code page 500).
- T1GE0 is used for code pages specific to the Sonoran Display and Sonoran Petite fonts (meaning the blackletter and extra-small (4pt) Sonoran fonts respectively, as opposed to the more general-purpose serif and sans-serif Sonoran fonts). The number given in the resource name where a CPGID might be expected is not an IBM CPGID.
- T1GI0 tends to denote a modification of the CPGID given in the resource name with SM57 replacing SM58; this is a minor mapping difference but nonetheless gets assigned another CPGID (which does not appear in the resource name). This applies to 38xx/4250 code pages. T1GI0396 is just CPGID 396 though, since SM58 doesn't exist in it. Notably, a similar approach is not taken for CCSIDs 4966 and 4976, which just exert special dispensation to explicitly instruct equating SM19/SD27 and SD15/SM09 (respectively) when applying the GCSGID to the CPGID definition.
- T1GP0 denotes "general process" according to GA32-1048-30; I know no actual examples of code pages with this resource name prefix, so cannot comment further.
- T1GPI denotes a Pi Fonts (lowercase-S symbol font) code page, meaning e.g. CPGID 363 (given in the resource name), but with SM600001 changed to SM590000 and SA350008 to SA350000, also apparently warrenting a new CPGID (not given in the resource name).
- T1H means a version of the stated SBCS CPGID with GCGIDs modified to their explicit halfwidth versions, for use in combination with a DBCS as part of a variable-width (multi-component) encoding. The character after the H can, in three exceptional cases, be K to denote a subset (less extended version) compared to the H0 one:
- T1I and T1J are used for JIKEI DBCSes. Otherwise, they are identical to the corresponding T10 code page, possibly with a different (CDRA private use range) CPGID in the headers, although the CPGID given in the resource name is still the usual one. So far as I can tell, these are only differentiated to kludge font selection for a specific JIS X 0208/0213 edition.
- T1K means an updated version of a DBCS, i.e. usually with more recent additions than the one with 0, and in any case a superset (not necessarily a strict superset) of it. For instance, T10837 for Simplified Chinese does not contain extensions for the GBK repertoire, but T1K837 does. When CDRA private-use range CPGIDs are used to assign distinct CPGIDs to these, it is generally the more expanded one (e.g. T1K837) that gets to keep the normal CPGID (837, in that case), though both will give the normal CPGID in the resource name.
- T1L means a LCS 3800-1 code page (the 3800 being an IBM laser printer, and LCS standing for "Library Character Set"). The rest of the resource name is not a CPGID, and usually isn't a bare decimal number (even when it is, it isn't a CPGID), but identifies the specific LCS 3800-1 code page.
- T1M denotes a set of mathematical symbols. How it's supposed to differ from T10 isn't clear (the only difference between T1000829 and T1M00829 seems to be that T1M00829 gives its code page title unicamerally). While T1M00829 is actually CPGID 829, T1M00830 is CPGID 2080 for some reason (this moves me to wonder whether the situation might be analogous to T1GI0, i.e. whether CPGID 2080 and (the otherwise-unknown) CPGID 830 differ only in the mapping of included glyphs to GCGIDs).
- T1S0 denotes a 6670-related code page (the 6670 being a combination laser printer and photocopier). The four-character identifier following is not a CPGID; the first character thereof may be "S" for symbol or "A" for APL.
- T1SKB denotes "standard keyboard" according to GA32-1048-30. This seems to relate to how CPGIDs 2001 through 2054 and 2117 are apparently assigned to keyboard layouts (e.g. "Keyboard 1 - U.S. Standard" and T1SKB001 for CPGID 2001, "Keyboard 254: U.K. (Special)" and presumably T1SKB254 for CPGID 2046), which I otherwise know very little about.
- As noted by 5770-SS1, "V1" denotes "version 1 of this code page". Usually, this means a post-1986 CECP (where it is as opposed to as opposed to T1GDP's pre-1986 "version 0"s), i.e. an EBCDIC code page including the entire ISO/IEC 8859-1 repertoire, usually an upward-compatible extension of a DP94 encoding. GA32-1048-30 considers the T1V1 code pages, along with the T100 code pages, as part of the "expanded core code pages".
- One exception is T1V10290 (CCSID 290), which extends T1000290/T1HK0290 (CCSID 4386) by adding lowercase letters but, in contrast to T1H00290 (CCSID 12578 = CCSID 8482), does not contain the Euro sign.
- A dubious exception is T1V10874, attested in 5770-SS1 (though not in its main list), which is extended ASCII, and not a CECP in repertoire either (it may be a mistake for T1000874, due to T1V10871 in the line above).