7Network Working Group D. Goldsmith

8Request for Comments: 2152 Apple Computer, Inc.

9Obsoletes: RFC 1642 M. Davis

10Category: Informational Taligent, Inc.

11 May 1997

14 UTF-7

16 A Mail-Safe Transformation Format of Unicode

18Status of this Memo

20 This memo provides information for the Internet community. This memo

21 does not specify an Internet standard of any kind. Distribution of

22 this memo is unlimited.

24Abstract

26 The Unicode Standard, version 2.0, and ISO/IEC 10646-1:1993(E) (as

27 amended) jointly define a character set (hereafter referred to as

28 Unicode) which encompasses most of the world's writing systems.

29 However, Internet mail (STD 11, RFC 822) currently supports only 7-

30 bit US ASCII as a character set. MIME (RFC 2045 through 2049) extends

31 Internet mail to support different media types and character sets,

32 and thus could support Unicode in mail messages. MIME neither defines

33 Unicode as a permitted character set nor specifies how it would be

34 encoded, although it does provide for the registration of additional

35 character sets over time.

37 This document describes a transformation format of Unicode that

38 contains only 7-bit ASCII octets and is intended to be readable by

39 humans in the limiting case that the document consists of characters

40 from the US-ASCII repertoire. It also specifies how this

41 transformation format is used in the context of MIME and RFC 1641,

42 "Using Unicode with MIME".

44Motivation

46 Although other transformation formats of Unicode exist and could

47 conceivably be used in this context (most notably UTF-8, also known

48 as UTF-2 or UTF-FSS), they suffer the disadvantage that they use

49 octets in the range decimal 128 through 255 to encode Unicode

50 characters outside the US-ASCII range. Thus, in the context of mail,

51 those octets must themselves be encoded. This requires putting text

52 through two successive encoding processes, and leads to a significant

53 expansion of characters outside the US-ASCII range, putting non-

54 English speakers at a disadvantage. For example, using UTF-8 together

58Goldsmith & Davis Informational [Page 1]

60RFC 2152 UTF-7 May 1997

63 with the Quoted-Printable content transfer encoding of MIME

64 represents US-ASCII characters in one octet, but other characters may

65 require up to nine octets.

67Overview

69 UTF-7 encodes Unicode characters as US-ASCII octets, together with ../imapserver/utf7.go:13

70 shift sequences to encode characters outside that range. For this

71 purpose, one of the characters in the US-ASCII repertoire is reserved

72 for use as a shift character.

74 Many mail gateways and systems cannot handle the entire US-ASCII

75 character set (those based on EBCDIC, for example), and so UTF-7

76 contains provisions for encoding characters within US-ASCII in a way

77 that all mail systems can accomodate.

79 UTF-7 should normally be used only in the context of 7 bit

80 transports, such as mail. In other contexts, straight Unicode or

81 UTF-8 is preferred.

83 See RFC 1641, "Using Unicode with MIME" for the overall specification

84 on usage of Unicode transformation formats with MIME.

86Definitions

88 First, the definition of Unicode:

90 The 16 bit character set Unicode is defined by "The Unicode

91 Standard, Version 2.0". This character set is identical with the

92 character repertoire and coding of the international standard

93 ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;

94 Subset=300; Implementation Level=3, including the first 7

95 amendments to 10646 plus editorial corrections.

97 Note. Unicode 2.0 further specifies the use and interaction of

98 these character codes beyond the ISO standard. However, any valid

99 10646 sequence is a valid Unicode sequence, and vice versa;

100 Unicode supplies interpretations of sequences on which the ISO

101 standard is silent as to interpretation.

102

103 Next, some handy definitions of US-ASCII character subsets:

104

105 Set D (directly encoded characters) consists of the following

106 characters (derived from RFC 1521, Appendix B, which no longer

107 appears in RFC 2045): the upper and lower case letters A through Z

108 and a through z, the 10 digits 0-9, and the following nine special

109 characters (note that "+" and "=" are omitted):

110

111

112

113

114Goldsmith & Davis Informational [Page 2]

115

116RFC 2152 UTF-7 May 1997

117

118

119 Character ASCII & Unicode Value (decimal)

120 ' 39

121 ( 40

122 ) 41

123 , 44

124 - 45

125 . 46

126 / 47

127 : 58

128 ? 63

129

130 Set O (optional direct characters) consists of the following

131 characters (note that "\" and "~" are omitted):

132

133 Character ASCII & Unicode Value (decimal)

134 ! 33

135 " 34

136 # 35

137 $ 36

138 % 37

139 & 38

140 * 42

141 ; 59

142 < 60

143 = 61

144 > 62

145 @ 64

146 [ 91

147 ] 93

148 ^ 94

149 _ 95

150 ' 96

151 { 123

152 | 124

153 } 125

154

155 Rationale. The characters "\" and "~" are omitted because they are

156 often redefined in variants of ASCII.

157

158 Set B (Modified Base 64) is the set of characters in the Base64

159 alphabet defined in RFC 2045, excluding the pad character "="

160 (decimal value 61).

161

162

163

164

165

166

167

168

169

170Goldsmith & Davis Informational [Page 3]

171

172RFC 2152 UTF-7 May 1997

173

174

175 Rationale. The pad character = is excluded because UTF-7 is designed

176 for use within header fields as set forth in RFC 2047. Since the only

177 readable encoding in RFC 2047 is "Q" (based on RFC 2045's Quoted-

178 Printable), the "=" character is not available for use (without a lot

179 of escape sequences). This was very unfortunate but unavoidable. The

180 "=" character could otherwise have been used as the UTF-7 escape

181 character as well (rather than using "+").

182

183 Note that all characters in US-ASCII have the same value in Unicode

184 when zero-extended to 16 bits.

185

186UTF-7 Definition

187

188 A UTF-7 stream represents 16-bit Unicode characters using 7-bit US-

189 ASCII octets as follows:

190

191 Rule 1: (direct encoding) Unicode characters in set D above may be

192 encoded directly as their ASCII equivalents. Unicode characters in

193 Set O may optionally be encoded directly as their ASCII

194 equivalents, bearing in mind that many of these characters are

195 illegal in header fields, or may not pass correctly through some

196 mail gateways.

197

198 Rule 2: (Unicode shifted encoding) Any Unicode character sequence

199 may be encoded using a sequence of characters in set B, when

200 preceded by the shift character "+" (US-ASCII character value

201 decimal 43). The "+" signals that subsequent octets are to be

202 interpreted as elements of the Modified Base64 alphabet until a

203 character not in that alphabet is encountered. Such characters

204 include control characters such as carriage returns and line

205 feeds; thus, a Unicode shifted sequence always terminates at the

206 of a line. As a special case, if the sequence terminates with the

207 character "-" (US-ASCII decimal 45) then that character is

208 absorbed; other terminating characters are not absorbed and are

209 processed normally.

210

211 Note that if the first character after the shifted sequence is "-"

212 then an extra "-" must be present to terminate the shifted

213 sequence so that the actual "-" is not itself absorbed.

214

215 Rationale. A terminating character is necessary for cases where

216 the next character after the Modified Base64 sequence is part of

217 character set B or is itself the terminating character. It can

218 also enhance readability by delimiting encoded sequences.

219

220

221

222

223

224

225

226Goldsmith & Davis Informational [Page 4]

227

228RFC 2152 UTF-7 May 1997

229

230

231 Also as a special case, the sequence "+-" may be used to encode

232 the character "+". A "+" character followed immediately by any

233 character other than members of set B or "-" is an ill-formed

234 sequence.

235

236 Unicode is encoded using Modified Base64 by first converting

237 Unicode 16-bit quantities to an octet stream (with the most

238 significant octet first). Surrogate pairs (UTF-16) are converted

239 by treating each half of the pair as a separate 16 bit quantity

240 (i.e., no special treatment). Text with an odd number of octets is

241 ill-formed. ISO 10646 characters outside the range addressable via

242 surrogate pairs cannot be encoded.

243

244 Rationale. ISO/IEC 10646-1:1993(E) specifies that when characters

245 the UCS-2 form are serialized as octets, that the most significant

246 octet appear first. This is also in keeping with common network

247 practice of choosing a canonical format for transmission.

248

249 Rationale. The policy for code point allocation within ISO 10646

250 and Unicode is that the repertoires be kept synchronized. No code

251 points will be allocated in ISO 10646 outside the range

252 addressable by surrogate pairs.

253

254 Next, the octet stream is encoded by applying the Base64 content

255 transfer encoding algorithm as defined in RFC 2045, modified to

256 omit the "=" pad character. Instead, when encoding, zero bits are

257 added to pad to a Base64 character boundary. When decoding, any

258 bits at the end of the Modified Base64 sequence that do not

259 constitute a complete 16-bit Unicode character are discarded. If

260 such discarded bits are non-zero the sequence is ill-formed.

261

262 Rationale. The pad character "=" is not used when encoding

263 Modified Base64 because of the conflict with its use as an escape

264 character for the Q content transfer encoding in RFC 2047 header

265 fields, as mentioned above.

266

267 Rule 3: The space (decimal 32), tab (decimal 9), carriage return

268 (decimal 13), and line feed (decimal 10) characters may be

269 directly represented by their ASCII equivalents. However, note

270 that MIME content transfer encodings have rules concerning the use

271 of such characters. Usage that does not conform to the

272 restrictions of RFC 822, for example, would have to be encoded

273 using MIME content transfer encodings other than 7bit or 8bit,

274 such as quoted-printable, binary, or base64.

275

276 Given this set of rules, Unicode characters which may be encoded via

277 rules 1 or 3 take one octet per character, and other Unicode

278 characters are encoded on average with 2 2/3 octets per character

279

280

281

282Goldsmith & Davis Informational [Page 5]

283

284RFC 2152 UTF-7 May 1997

285

286

287 plus one octet to switch into Modified Base64 and an optional octet

288 to switch out.

289

290 Example. The Unicode sequence "A<NOT IDENTICAL TO><ALPHA>."

291 (hexadecimal 0041,2262,0391,002E) may be encoded as follows:

292

293 A+ImIDkQ.

294

295 Example. The Unicode sequence "Hi Mom -<WHITE SMILING FACE>-!"

296 (hexadecimal 0048, 0069, 0020, 004D, 006F, 006D, 0020, 002D, 263A,

297 002D, 0021) may be encoded as follows:

298

299 Hi Mom -+Jjo--!

300

301 Example. The Unicode sequence representing the Han characters for

302 the Japanese word "nihongo" (hexadecimal 65E5,672C,8A9E) may be

303 encoded as follows:

304

305 +ZeVnLIqe-

306

307Use of Character Set UTF-7 Within MIME

308

309 Character set UTF-7 is safe for mail transmission and therefore may

310 be used with any content transfer encoding in MIME (except where line

311 length and line break restrictions are violated). Specifically, the 7

312 bit encoding for bodies and the Q encoding for headers are both

313 acceptable. The MIME character set tag is UTF-7. This signifies any

314 version of Unicode equal to or greater than 2.0.

315

316 Example. Here is a text portion of a MIME message containing the

317 Unicode sequence "Hi Mom <WHITE SMILING FACE>!" (hexadecimal 0048,

318 0069, 0020, 004D, 006F, 006D, 0020, 263A, 0021).

319

320 Content-Type: text/plain; charset=UTF-7

321

322 Hi Mom +Jjo-!

323

324 Example. Here is a text portion of a MIME message containing the

325 Unicode sequence representing the Han characters for the Japanese

326 word "nihongo" (hexadecimal 65E5,672C,8A9E).

327

328 Content-Type: text/plain; charset=UTF-7

329

330 +ZeVnLIqe-

331

332 Example. Here is a text portion of a MIME message containing the

333 Unicode sequence "A<NOT IDENTICAL TO><ALPHA>." (hexadecimal

334 0041,2262,0391,002E).

335

336

337

338Goldsmith & Davis Informational [Page 6]

339

340RFC 2152 UTF-7 May 1997

341

342

343 Content-Type: text/plain; charset=utf-7

344

345 A+ImIDkQ.

346

347 Example. Here is a text portion of a MIME message containing the

348 Unicode sequence "Item 3 is <POUND SIGN>1." (hexadecimal 0049,

349 0074, 0065, 006D, 0020, 0033, 0020, 0069, 0073, 0020, 00A3, 0031,

350 002E).

351

352 Content-Type: text/plain; charset=UTF-7

353

354 Item 3 is +AKM-1.

355

356 Note that to achieve the best interoperability with systems that may

357 not support Unicode or MIME, when preparing text for mail

358 transmission line breaks should follow Internet conventions. This

359 means that lines should be short and terminated with the proper SMTP

360 CRLF sequence. Unicode LINE SEPARATOR (hexadecimal 2028) and

361 PARAGRAPH SEPARATOR (hexadecimal 2029) should be converted to SMTP

362 line breaks. Ideally, this would be handled transparently by a

363 Unicode-aware user agent.

364

365 This preparation is not absolutely necessary, since UTF-7 and the

366 appropriate MIME content transfer encoding can handle text that does

367 not follow Internet conventions, but readability by systems without

368 Unicode or MIME will be impaired. See RFC 2045 for a discussion of

369 mail interoperability issues.

370

371 Lines should never be broken in the middle of a UTF-7 shifted

372 sequence, since such sequences may not cross line breaks. Therefore,

373 UTF-7 encoding should take place after line breaking. If a line

374 containing a shifted sequence is too long after encoding, a MIME

375 content transfer encoding such as Quoted Printable can be used to

376 encode the text. Another possibility is to perform line breaking and

377 UTF-7 encoding at the same time, so that lines containing shifted

378 sequences already conform to length restrictions.

379

380Discussion

381

382 In this section we will motivate the introduction of UTF-7 as opposed

383 to the alternative of using the existing transformation formats of

384 Unicode (e.g., UTF-8) with MIME's content transfer encodings. Before

385 discussing this, it will be useful to list some assumptions about

386 character frequency within typical natural language text strings that

387 we use to estimate typical storage requirements:

388

389 1. Most Western European languages use roughly 7/8 of their letters

390 from US-ASCII and 1/8 from Latin 1 (ISO-8859-1).

391

392

393

394Goldsmith & Davis Informational [Page 7]

395

396RFC 2152 UTF-7 May 1997

397

398

399 2. Most non-Roman alphabet-based languages (e.g., Greek) use about

400 1/6 of their letters from ASCII (since white space is in the 7-bit

401 area) and the rest from their alphabets.

402

403 3. East Asian ideographic-based languages (including Japanese) use

404 essentially all of their characters from the Han or CJK syllabary

405 area.

406

407 4. Non-directly encoded punctuation characters do not occur

408 frequently enough to affect the results.

409

410 Notice that current 8 bit standards, such as ISO-8859-x, require use

411 of a content transfer encoding. For comparison with the subsequent

412 discussion, the costs break down as follows (note that many of these

413 figures are approximate since they depend on the exact composition of

414 the text):

415

416 8859-x in Base64

417

418 Text type Average octets/character

419 All 1.33

420

421 8859-x in Quoted Printable

422

423 Text type Average octets/character

424 US-ASCII 1

425 Western European 1.25

426 Other 2.67

427

428 Note also that Unicode encoded in Base64 takes a constant 2.67 octets

429 per character. For purposes of comparison, we will look at UTF-8 in

430 Base64 and Quoted Printable, and UTF-7. Also note that fixed overhead

431 for long strings is relative to 1/n, where n is the encoded string

432 length in octets.

433

434 UTF-8 in Base64

435

436 Text type Average octets/character

437 US-ASCII 1.33

438 Western European 1.5

439 Some Alphabetics 2.44

440 All others 4

441

442

443

444

445

446

447

448

449

450Goldsmith & Davis Informational [Page 8]

451

452RFC 2152 UTF-7 May 1997

453

454

455 UTF-8 in Quoted Printable

456

457 Text type Average octets/character

458 US-ASCII 1

459 Western European 1.63

460 Some Alphabetics 5.17

461 All others 7-9

462

463 UTF-7

464

465 Text type Average octets/character

466 Most US-ASCII 1

467 Western European 1.5

468 All others 2.67+2/n

469

470 We feel that the UTF-8 in Quoted Printable option is not viable due

471 to the very large expansion of all text except Western European. This

472 would only be viable in texts consisting of large expanses of US-

473 ASCII or Latin characters with occasional other characters

474 interspersed. We would prefer to introduce one encoding that works

475 reasonably well for all users.

476

477 We also feel that UTF-8 in Base64 has high expansion for non-

478 Western-European users, and is less desirable because it cannot be

479 read directly, even when the content is largely US-ASCII. The base

480 encoding of UTF-7 gives competitive results and is readable for ASCII

481 text.

482

483 UTF-7 gives results competitive with ISO-8859-x, with access to all

484 of the Unicode character set. We believe this justifies the

485 introduction of a new transformation format of Unicode.

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506Goldsmith & Davis Informational [Page 9]

507

508RFC 2152 UTF-7 May 1997

509

510

511 As an alternative to use of UTF-7, it might be possible to intermix

512 Unicode characters with other character sets using an existing MIME

513 mechanism, the multipart/mixed content type, ignoring for the moment

514 the issues with line breaks (thanks to Nathaniel Borenstein for

515 suggesting this). For instance (repeating an earlier example):

516

517 Content-type: multipart/mixed; boundary=foo

518 Content-Disposition: inline

519

520 --foo

521 Content-type: text/plain; charset=us-ascii

522

523 Hi Mom

524 --foo

525 Content-type: text/plain; charset=UNICODE-2-0

526 Content-transfer-encoding: base64

527

528 Jjo=

529 --foo

530 Content-type: text/plain; charset=us-ascii

531

532 !

533 --foo--

534

535 Theoretically, this removes the need for UTF-7 in message bodies

536 (multipart may not be used in header fields). However, we feel that

537 as use of the Unicode character set becomes more widespread,

538 intermittent use of specialized Unicode characters (such as dingbats

539 and mathematical symbols) will occur, and that text will also

540 typically include small snippets from other scripts, such as

541 Cyrillic, Greek, or East Asian languages (anything in the Roman

542 script is already handled adequately by existing MIME character

543 sets). Although the multipart technique works well for large chunks

544 of text in alternating character sets, we feel it does not adequately

545 support the kinds of uses just discussed, and so we still believe the

546 introduction of UTF-7 is justified.

547

548Summary

549

550 The UTF-7 encoding allows Unicode characters to be encoded within the

551 US-ASCII 7 bit character set. It is most effective for Unicode

552 sequences which contain relatively long strings of US-ASCII

553 characters interspersed with either single Unicode characters or

554 strings of Unicode characters, as it allows the US-ASCII portions to

555 be read on systems without direct Unicode support.

556

557 UTF-7 should only be used with 7 bit transports such as mail. In

558 other contexts, use of straight Unicode or UTF-8 is preferred.

559

560

561

562Goldsmith & Davis Informational [Page 10]

563

564RFC 2152 UTF-7 May 1997

565

566

567Acknowledgements

568

569 Many thanks to the following people for their contributions,

570 comments, and suggestions. If we have omitted anyone it was through

571 oversight and not intentionally.

572

573 Glenn Adams

574 Harald T. Alvestrand

575 Nathaniel Borenstein

576 Lee Collins

577 Jim Conklin

578 Dave Crocker

579 Steve Dorner

580 Dana S. Emery

581 Ned Freed

582 Kari E. Hurtta

583 John H. Jenkins

584 John C. Klensin

585 Valdis Kletnieks

586 Keith Moore

587 Masataka Ohta

588 Einar Stefferud

589 Erik M. van der Poel

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618Goldsmith & Davis Informational [Page 11]

619

620RFC 2152 UTF-7 May 1997

621

622

623Appendix A -- Examples

624

625 Here is a longer example, taken from a document originally in Big5

626 code. It has been condensed for brevity. There are two versions: the

627 first uses optional characters from set O (and so may not pass

628 through some mail gateways), and the second does not.

629

630 Content-type: text/plain; charset=utf-7

631

632 Below is the full Chinese text of the Analects (+itaKng-).

633

634 The sources for the text are:

635

636 "The sayings of Confucius," James R. Ware, trans. +U/BTFw-:

637 +ZYeB9FH6ckh5Pg-, 1980. (Chinese text with English translation)

638

639 +Vttm+E6UfZM-, +W4tRQ066bOg-, +UxdOrA-: +Ti1XC2b4Xpc-, 1990.

640

641 "The Chinese Classics with a Translation, Critical and Exegetical

642 Notes, Prolegomena, and Copius Indexes," James Legge, trans., Taipei:

643 Southern Materials Center Publishing, Inc., 1991. (Chinese text with

644 English translation)

645

646 Big Five and GB versions of the text are being made available

647 separately.

648

649 Neither the Big Five nor GB contain all the characters used in this

650 text. Missing characters have been indicated using their Unicode/ISO

651 10646 code points. "U+-" followed by four hexadecimal digits

652 indicates a Unicode/10646 code (e.g., U+-9F08). There is no good

653 solution to the problem of the small size of the Big Five/GB

654 character sets; this represents the solution I find personally most

655 satisfactory.

656

657 (omitted...)

658

659 I have tried to minimize this problem by using variant characters

660 where they were available and the character actually in the text was

661 not. Only variants listed as such in the +XrdxmVtXUXg- were used.

662

663 (omitted...)

664

665 John H. Jenkins +TpVPXGBG- jenkins@apple.com 5 January 1993

666 (omitted...)

667

668 Content-type: text/plain; charset=utf-7

669

670 Below is the full Chinese text of the Analects (+itaKng-).

671

672

673

674Goldsmith & Davis Informational [Page 12]

675

676RFC 2152 UTF-7 May 1997

677

678

679 The sources for the text are:

680

681 +ACI-The sayings of Confucius,+ACI- James R. Ware, trans. +U/BTFw-:

682 +ZYeB9FH6ckh5Pg-, 1980. (Chinese text with English translation)

683

684 +Vttm+E6UfZM-, +W4tRQ066bOg-, +UxdOrA-: +Ti1XC2b4Xpc-, 1990.

685

686 +ACI-The Chinese Classics with a Translation, Critical and Exegetical

687 Notes, Prolegomena, and Copius Indexes,+ACI- James Legge, trans.,

688 Taipei: Southern Materials Center Publishing, Inc., 1991. (Chinese

689 text with English translation)

690

691 Big Five and GB versions of the text are being made available

692 separately.

693

694 Neither the Big Five nor GB contain all the characters used in this

695 text. Missing characters have been indicated using their Unicode/ISO

696 10646 code points. +ACI-U+-+ACI- followed by four hexadecimal digits

697 indicates a Unicode/10646 code (e.g., U+-9F08). There is no good

698 solution to the problem of the small size of the Big Five/GB

699 character sets+ADs- this represents the solution I find personally

700 most satisfactory.

701

702 (omitted...)

703

704 I have tried to minimize this problem by using variant characters

705 where they were available and the character actually in the text was

706 not. Only variants listed as such in the +XrdxmVtXUXg- were used.

707 (omitted...)

708

709 John H. Jenkins +TpVPXGBG- jenkins+AEA-apple.com 5 January 1993

710 (omitted...)

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730Goldsmith & Davis Informational [Page 13]

731

732RFC 2152 UTF-7 May 1997

733

734

735Security Considerations

736

737 Security issues are not discussed in this memo.

738

739References

740

741[UNICODE 2.0] "The Unicode Standard, Version 2.0", The Unicode

742 Consortium, Addison-Wesley, 1996. ISBN 0-201-48345-9.

743

744[ISO 10646] ISO/IEC 10646-1:1993(E) Information Technology--Universal

745 Multiple-octet Coded Character Set (UCS). See also

746 amendments 1 through 7, plus editorial corrections.

747

748[RFC-1641] Goldsmith, D., and M. Davis, "Using Unicode with MIME",

749 RFC 1641, Taligent, Inc., July 1994.

750

751[US-ASCII] Coded Character Set--7-bit American Standard Code for

752 Information Interchange, ANSI X3.4-1986.

753

754[ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic

755 Character Sets -- Part 1: Latin Alphabet No. 1, ISO

756 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2,

757 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988.

758 Part 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5:

759 Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6:

760 Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7:

761 Latin/Greek alphabet, ISO 8859-7, 1987. Part 8:

762 Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin

763 alphabet No. 5, ISO 8859-9, 1990.

764

765[RFC822] Crocker, D., "Standard for the Format of ARPA Internet

766 Text Messages", STD 11, RFC 822, UDEL, August 1982.

767

768[MIME] Borenstein N., N. Freed, K. Moore, J. Klensin, and J.

769 Postel, "MIME (Multipurpose Internet Mail Extensions)

770 Parts One through Five", RFC 2045, 2046, 2047, 2048, and

771 2049, November 1996.

772

773Authors' Addresses

774

775 David Goldsmith

776 Apple Computer, Inc.

777 2 Infinite Loop, MS: 302-2IS

778 Cupertino, CA 95014

779

780 Phone: 408-974-1957

781 Fax: 408-862-4566

782 EMail: goldsmith@apple.com

783

784

785

786Goldsmith & Davis Informational [Page 14]

787

788RFC 2152 UTF-7 May 1997

789

790

791 Mark Davis

792 Taligent, Inc.

793 10201 N. DeAnza Blvd.

794 Cupertino, CA 95014-2233

795

796 Phone: 408-777-5116

797 Fax: 408-777-5081

798 EMail: mark_davis@taligent.com

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842Goldsmith & Davis Informational [Page 15]

843

844