7Network Working Group N. Freed

8Request for Comments: 2046 Innosoft

9Obsoletes: 1521, 1522, 1590 N. Borenstein

10Category: Standards Track First Virtual

11 November 1996

14 Multipurpose Internet Mail Extensions

15 (MIME) Part Two:

16 Media Types

18Status of this Memo

20 This document specifies an Internet standards track protocol for the

21 Internet community, and requests discussion and suggestions for

22 improvements. Please refer to the current edition of the "Internet

23 Official Protocol Standards" (STD 1) for the standardization state

24 and status of this protocol. Distribution of this memo is unlimited.

26Abstract

28 STD 11, RFC 822 defines a message representation protocol specifying

29 considerable detail about US-ASCII message headers, but which leaves

30 the message content, or message body, as flat US-ASCII text. This

31 set of documents, collectively called the Multipurpose Internet Mail

32 Extensions, or MIME, redefines the format of messages to allow for

34 (1) textual message bodies in character sets other than

35 US-ASCII,

37 (2) an extensible set of different formats for non-textual

38 message bodies,

40 (3) multi-part message bodies, and

42 (4) textual header information in character sets other than

43 US-ASCII.

45 These documents are based on earlier work documented in RFC 934, STD

46 11, and RFC 1049, but extends and revises them. Because RFC 822 said

47 so little about message bodies, these documents are largely

48 orthogonal to (rather than a revision of) RFC 822.

50 The initial document in this set, RFC 2045, specifies the various

51 headers used to describe the structure of MIME messages. This second

52 document defines the general structure of the MIME media typing

53 system and defines an initial set of media types. The third document,

54 RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text

58Freed & Borenstein Standards Track [Page 1]

60RFC 2046 Media Types November 1996

63 data in Internet mail header fields. The fourth document, RFC 2048,

64 specifies various IANA registration procedures for MIME-related

65 facilities. The fifth and final document, RFC 2049, describes MIME

66 conformance criteria as well as providing some illustrative examples

67 of MIME message formats, acknowledgements, and the bibliography.

69 These documents are revisions of RFCs 1521 and 1522, which themselves

70 were revisions of RFCs 1341 and 1342. An appendix in RFC 2049

71 describes differences and changes from previous versions.

73Table of Contents

75 1. Introduction ......................................... 3

76 2. Definition of a Top-Level Media Type ................. 4

77 3. Overview Of The Initial Top-Level Media Types ........ 4

78 4. Discrete Media Type Values ........................... 6

79 4.1 Text Media Type ..................................... 6

80 4.1.1 Representation of Line Breaks ..................... 7

81 4.1.2 Charset Parameter ................................. 7

82 4.1.3 Plain Subtype ..................................... 11

83 4.1.4 Unrecognized Subtypes ............................. 11

84 4.2 Image Media Type .................................... 11

85 4.3 Audio Media Type .................................... 11

86 4.4 Video Media Type .................................... 12

87 4.5 Application Media Type .............................. 12

88 4.5.1 Octet-Stream Subtype .............................. 13

89 4.5.2 PostScript Subtype ................................ 14

90 4.5.3 Other Application Subtypes ........................ 17

91 5. Composite Media Type Values .......................... 17

92 5.1 Multipart Media Type ................................ 17

93 5.1.1 Common Syntax ..................................... 19

94 5.1.2 Handling Nested Messages and Multiparts ........... 24

95 5.1.3 Mixed Subtype ..................................... 24

96 5.1.4 Alternative Subtype ............................... 24

97 5.1.5 Digest Subtype .................................... 26

98 5.1.6 Parallel Subtype .................................. 27

99 5.1.7 Other Multipart Subtypes .......................... 28

100 5.2 Message Media Type .................................. 28

101 5.2.1 RFC822 Subtype .................................... 28

102 5.2.2 Partial Subtype ................................... 29

103 5.2.2.1 Message Fragmentation and Reassembly ............ 30

104 5.2.2.2 Fragmentation and Reassembly Example ............ 31

105 5.2.3 External-Body Subtype ............................. 33

106 5.2.4 Other Message Subtypes ............................ 40

107 6. Experimental Media Type Values ....................... 40

108 7. Summary .............................................. 41

109 8. Security Considerations .............................. 41

110 9. Authors' Addresses ................................... 42

111

112

113

114Freed & Borenstein Standards Track [Page 2]

115

116RFC 2046 Media Types November 1996

117

118

119 A. Collected Grammar .................................... 43

120

1211. Introduction

122

123 The first document in this set, RFC 2045, defines a number of header

124 fields, including Content-Type. The Content-Type field is used to

125 specify the nature of the data in the body of a MIME entity, by

126 giving media type and subtype identifiers, and by providing auxiliary

127 information that may be required for certain media types. After the

128 type and subtype names, the remainder of the header field is simply a

129 set of parameters, specified in an attribute/value notation. The

130 ordering of parameters is not significant.

131

132 In general, the top-level media type is used to declare the general

133 type of data, while the subtype specifies a specific format for that

134 type of data. Thus, a media type of "image/xyz" is enough to tell a

135 user agent that the data is an image, even if the user agent has no

136 knowledge of the specific image format "xyz". Such information can

137 be used, for example, to decide whether or not to show a user the raw

138 data from an unrecognized subtype -- such an action might be

139 reasonable for unrecognized subtypes of "text", but not for

140 unrecognized subtypes of "image" or "audio". For this reason,

141 registered subtypes of "text", "image", "audio", and "video" should

142 not contain embedded information that is really of a different type.

143 Such compound formats should be represented using the "multipart" or

144 "application" types.

145

146 Parameters are modifiers of the media subtype, and as such do not

147 fundamentally affect the nature of the content. The set of

148 meaningful parameters depends on the media type and subtype. Most

149 parameters are associated with a single specific subtype. However, a

150 given top-level media type may define parameters which are applicable

151 to any subtype of that type. Parameters may be required by their

152 defining media type or subtype or they may be optional. MIME

153 implementations must also ignore any parameters whose names they do

154 not recognize.

155

156 MIME's Content-Type header field and media type mechanism has been

157 carefully designed to be extensible, and it is expected that the set

158 of media type/subtype pairs and their associated parameters will grow

159 significantly over time. Several other MIME facilities, such as

160 transfer encodings and "message/external-body" access types, are

161 likely to have new values defined over time. In order to ensure that

162 the set of such values is developed in an orderly, well-specified,

163 and public manner, MIME sets up a registration process which uses the

164 Internet Assigned Numbers Authority (IANA) as a central registry for

165 MIME's various areas of extensibility. The registration process for

166 these areas is described in a companion document, RFC 2048.

167

168

169

170Freed & Borenstein Standards Track [Page 3]

171

172RFC 2046 Media Types November 1996

173

174

175 The initial seven standard top-level media type are defined and

176 described in the remainder of this document.

177

1782. Definition of a Top-Level Media Type

179

180 The definition of a top-level media type consists of:

181

182 (1) a name and a description of the type, including

183 criteria for whether a particular type would qualify

184 under that type,

185

186 (2) the names and definitions of parameters, if any, which

187 are defined for all subtypes of that type (including

188 whether such parameters are required or optional),

189

190 (3) how a user agent and/or gateway should handle unknown

191 subtypes of this type,

192

193 (4) general considerations on gatewaying entities of this

194 top-level type, if any, and

195

196 (5) any restrictions on content-transfer-encodings for

197 entities of this top-level type.

198

1993. Overview Of The Initial Top-Level Media Types

200

201 The five discrete top-level media types are:

202

203 (1) text -- textual information. The subtype "plain" in

204 particular indicates plain text containing no

205 formatting commands or directives of any sort. Plain

206 text is intended to be displayed "as-is". No special

207 software is required to get the full meaning of the

208 text, aside from support for the indicated character

209 set. Other subtypes are to be used for enriched text in

210 forms where application software may enhance the

211 appearance of the text, but such software must not be

212 required in order to get the general idea of the

213 content. Possible subtypes of "text" thus include any

214 word processor format that can be read without

215 resorting to software that understands the format. In

216 particular, formats that employ embeddded binary

217 formatting information are not considered directly

218 readable. A very simple and portable subtype,

219 "richtext", was defined in RFC 1341, with a further

220 revision in RFC 1896 under the name "enriched".

221

222

223

224

225

226Freed & Borenstein Standards Track [Page 4]

227

228RFC 2046 Media Types November 1996

229

230

231 (2) image -- image data. "Image" requires a display device

232 (such as a graphical display, a graphics printer, or a

233 FAX machine) to view the information. An initial

234 subtype is defined for the widely-used image format

235 JPEG. . subtypes are defined for two widely-used image

236 formats, jpeg and gif.

237

238 (3) audio -- audio data. "Audio" requires an audio output

239 device (such as a speaker or a telephone) to "display"

240 the contents. An initial subtype "basic" is defined in

241 this document.

242

243 (4) video -- video data. "Video" requires the capability

244 to display moving images, typically including

245 specialized hardware and software. An initial subtype

246 "mpeg" is defined in this document.

247

248 (5) application -- some other kind of data, typically

249 either uninterpreted binary data or information to be

250 processed by an application. The subtype "octet-

251 stream" is to be used in the case of uninterpreted

252 binary data, in which case the simplest recommended

253 action is to offer to write the information into a file

254 for the user. The "PostScript" subtype is also defined

255 for the transport of PostScript material. Other

256 expected uses for "application" include spreadsheets,

257 data for mail-based scheduling systems, and languages

258 for "active" (computational) messaging, and word

259 processing formats that are not directly readable.

260 Note that security considerations may exist for some

261 types of application data, most notably

262 "application/PostScript" and any form of active

263 messaging. These issues are discussed later in this

264 document.

265

266 The two composite top-level media types are:

267

268 (1) multipart -- data consisting of multiple entities of

269 independent data types. Four subtypes are initially

270 defined, including the basic "mixed" subtype specifying

271 a generic mixed set of parts, "alternative" for

272 representing the same data in multiple formats,

273 "parallel" for parts intended to be viewed

274 simultaneously, and "digest" for multipart entities in

275 which each part has a default type of "message/rfc822".

276

277

278

279

280

281

282Freed & Borenstein Standards Track [Page 5]

283

284RFC 2046 Media Types November 1996

285

286

287 (2) message -- an encapsulated message. A body of media

288 type "message" is itself all or a portion of some kind

289 of message object. Such objects may or may not in turn

290 contain other entities. The "rfc822" subtype is used

291 when the encapsulated content is itself an RFC 822

292 message. The "partial" subtype is defined for partial

293 RFC 822 messages, to permit the fragmented transmission

294 of bodies that are thought to be too large to be passed

295 through transport facilities in one piece. Another

296 subtype, "external-body", is defined for specifying

297 large bodies by reference to an external data source.

298

299 It should be noted that the list of media type values given here may

300 be augmented in time, via the mechanisms described above, and that

301 the set of subtypes is expected to grow substantially.

302

3034. Discrete Media Type Values

304

305 Five of the seven initial media type values refer to discrete bodies.

306 The content of these types must be handled by non-MIME mechanisms;

307 they are opaque to MIME processors.

308

3094.1. Text Media Type

310

311 The "text" media type is intended for sending material which is

312 principally textual in form. A "charset" parameter may be used to

313 indicate the character set of the body text for "text" subtypes,

314 notably including the subtype "text/plain", which is a generic

315 subtype for plain text. Plain text does not provide for or allow

316 formatting commands, font attribute specifications, processing

317 instructions, interpretation directives, or content markup. Plain

318 text is seen simply as a linear sequence of characters, possibly

319 interrupted by line breaks or page breaks. Plain text may allow the

320 stacking of several characters in the same position in the text.

321 Plain text in scripts like Arabic and Hebrew may also include

322 facilitites that allow the arbitrary mixing of text segments with

323 opposite writing directions.

324

325 Beyond plain text, there are many formats for representing what might

326 be known as "rich text". An interesting characteristic of many such

327 representations is that they are to some extent readable even without

328 the software that interprets them. It is useful, then, to

329 distinguish them, at the highest level, from such unreadable data as

330 images, audio, or text represented in an unreadable form. In the

331 absence of appropriate interpretation software, it is reasonable to

332 show subtypes of "text" to the user, while it is not reasonable to do

333 so with most nontextual data. Such formatted textual data should be

334 represented using subtypes of "text".

335

336

337

338Freed & Borenstein Standards Track [Page 6]

339

340RFC 2046 Media Types November 1996

341

342

3434.1.1. Representation of Line Breaks

344

345 The canonical form of any MIME "text" subtype MUST always represent a

346 line break as a CRLF sequence. Similarly, any occurrence of CRLF in

347 MIME "text" MUST represent a line break. Use of CR and LF outside of

348 line break sequences is also forbidden.

349

350 This rule applies regardless of format or character set or sets

351 involved.

352

353 NOTE: The proper interpretation of line breaks when a body is

354 displayed depends on the media type. In particular, while it is

355 appropriate to treat a line break as a transition to a new line when

356 displaying a "text/plain" body, this treatment is actually incorrect

357 for other subtypes of "text" like "text/enriched" [RFC-1896].

358 Similarly, whether or not line breaks should be added during display

359 operations is also a function of the media type. It should not be

360 necessary to add any line breaks to display "text/plain" correctly,

361 whereas proper display of "text/enriched" requires the appropriate

362 addition of line breaks.

363

364 NOTE: Some protocols defines a maximum line length. E.g. SMTP [RFC-

365 821] allows a maximum of 998 octets before the next CRLF sequence.

366 To be transported by such protocols, data which includes too long

367 segments without CRLF sequences must be encoded with a suitable

368 content-transfer-encoding.

369

3704.1.2. Charset Parameter

371

372 A critical parameter that may be specified in the Content-Type field

373 for "text/plain" data is the character set. This is specified with a

374 "charset" parameter, as in:

375

376 Content-type: text/plain; charset=iso-8859-1

377

378 Unlike some other parameter values, the values of the charset

379 parameter are NOT case sensitive. The default character set, which

380 must be assumed in the absence of a charset parameter, is US-ASCII.

381

382 The specification for any future subtypes of "text" must specify

383 whether or not they will also utilize a "charset" parameter, and may

384 possibly restrict its values as well. For other subtypes of "text"

385 than "text/plain", the semantics of the "charset" parameter should be

386 defined to be identical to those specified here for "text/plain",

387 i.e., the body consists entirely of characters in the given charset.

388 In particular, definers of future "text" subtypes should pay close

389 attention to the implications of multioctet character sets for their

390 subtype definitions.

391

392

393

394Freed & Borenstein Standards Track [Page 7]

395

396RFC 2046 Media Types November 1996

397

398

399 The charset parameter for subtypes of "text" gives a name of a

400 character set, as "character set" is defined in RFC 2045. The rules

401 regarding line breaks detailed in the previous section must also be

402 observed -- a character set whose definition does not conform to

403 these rules cannot be used in a MIME "text" subtype.

404

405 An initial list of predefined character set names can be found at the

406 end of this section. Additional character sets may be registered

407 with IANA.

408

409 Other media types than subtypes of "text" might choose to employ the

410 charset parameter as defined here, but with the CRLF/line break

411 restriction removed. Therefore, all character sets that conform to

412 the general definition of "character set" in RFC 2045 can be

413 registered for MIME use.

414

415 Note that if the specified character set includes 8-bit characters

416 and such characters are used in the body, a Content-Transfer-Encoding

417 header field and a corresponding encoding on the data are required in

418 order to transmit the body via some mail transfer protocols, such as

419 SMTP [RFC-821].

420

421 The default character set, US-ASCII, has been the subject of some

422 confusion and ambiguity in the past. Not only were there some

423 ambiguities in the definition, there have been wide variations in

424 practice. In order to eliminate such ambiguity and variations in the

425 future, it is strongly recommended that new user agents explicitly

426 specify a character set as a media type parameter in the Content-Type

427 header field. "US-ASCII" does not indicate an arbitrary 7-bit

428 character set, but specifies that all octets in the body must be

429 interpreted as characters according to the US-ASCII character set.

430 National and application-oriented versions of ISO 646 [ISO-646] are

431 usually NOT identical to US-ASCII, and in that case their use in

432 Internet mail is explicitly discouraged. The omission of the ISO 646

433 character set from this document is deliberate in this regard. The

434 character set name of "US-ASCII" explicitly refers to the character

435 set defined in ANSI X3.4-1986 [US- ASCII]. The new international

436 reference version (IRV) of the 1991 edition of ISO 646 is identical

437 to US-ASCII. The character set name "ASCII" is reserved and must not

438 be used for any purpose.

439

440 NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier

441 version of the American Standard. Insofar as one of the purposes of

442 specifying a media type and character set is to permit the receiver

443 to unambiguously determine how the sender intended the coded message

444 to be interpreted, assuming anything other than "strict ASCII" as the

445 default would risk unintentional and incompatible changes to the

446 semantics of messages now being transmitted. This also implies that

447

448

449

450Freed & Borenstein Standards Track [Page 8]

451

452RFC 2046 Media Types November 1996

453

454

455 messages containing characters coded according to other versions of

456 ISO 646 than US-ASCII and the 1991 IRV, or using code-switching

457 procedures (e.g., those of ISO 2022), as well as 8bit or multiple

458 octet character encodings MUST use an appropriate character set

459 specification to be consistent with MIME.

460

461 The complete US-ASCII character set is listed in ANSI X3.4- 1986.

462 Note that the control characters including DEL (0-31, 127) have no

463 defined meaning in apart from the combination CRLF (US-ASCII values

464 13 and 10) indicating a new line. Two of the characters have de

465 facto meanings in wide use: FF (12) often means "start subsequent

466 text on the beginning of a new page"; and TAB or HT (9) often (though

467 not always) means "move the cursor to the next available column after

468 the current position where the column number is a multiple of 8

469 (counting the first column as column 0)." Aside from these

470 conventions, any use of the control characters or DEL in a body must

471 either occur

472

473 (1) because a subtype of text other than "plain"

474 specifically assigns some additional meaning, or

475

476 (2) within the context of a private agreement between the

477 sender and recipient. Such private agreements are

478 discouraged and should be replaced by the other

479 capabilities of this document.

480

481 NOTE: An enormous proliferation of character sets exist beyond US-

482 ASCII. A large number of partially or totally overlapping character

483 sets is NOT a good thing. A SINGLE character set that can be used

484 universally for representing all of the world's languages in Internet

485 mail would be preferrable. Unfortunately, existing practice in

486 several communities seems to point to the continued use of multiple

487 character sets in the near future. A small number of standard

488 character sets are, therefore, defined for Internet use in this

489 document.

490

491 The defined charset values are:

492

493 (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].

494

495 (2) ISO-8859-X -- where "X" is to be replaced, as

496 necessary, for the parts of ISO-8859 [ISO-8859]. Note

497 that the ISO 646 character sets have deliberately been

498 omitted in favor of their 8859 replacements, which are

499 the designated character sets for Internet mail. As of

500 the publication of this document, the legitimate values

501 for "X" are the digits 1 through 10.

502

503

504

505

506Freed & Borenstein Standards Track [Page 9]

507

508RFC 2046 Media Types November 1996

509

510

511 Characters in the range 128-159 has no assigned meaning in ISO-8859-

512 X. Characters with values below 128 in ISO-8859-X have the same

513 assigned meaning as they do in US-ASCII.

514

515 Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew

516 alphabet) includes both characters for which the normal writing

517 direction is right to left and characters for which it is left to

518 right, but do not define a canonical ordering method for representing

519 bi-directional text. The charset values "ISO-8859-6" and "ISO-8859-

520 8", however, specify that the visual method is used [RFC-1556].

521

522 All of these character sets are used as pure 7bit or 8bit sets

523 without any shift or escape functions. The meaning of shift and

524 escape sequences in these character sets is not defined.

525

526 The character sets specified above are the ones that were relatively

527 uncontroversial during the drafting of MIME. This document does not

528 endorse the use of any particular character set other than US-ASCII,

529 and recognizes that the future evolution of world character sets

530 remains unclear.

531

532 Note that the character set used, if anything other than US- ASCII,

533 must always be explicitly specified in the Content-Type field.

534

535 No character set name other than those defined above may be used in

536 Internet mail without the publication of a formal specification and

537 its registration with IANA, or by private agreement, in which case

538 the character set name must begin with "X-".

539

540 Implementors are discouraged from defining new character sets unless

541 absolutely necessary.

542

543 The "charset" parameter has been defined primarily for the purpose of

544 textual data, and is described in this section for that reason.

545 However, it is conceivable that non-textual data might also wish to

546 specify a charset value for some purpose, in which case the same

547 syntax and values should be used.

548

549 In general, composition software should always use the "lowest common

550 denominator" character set possible. For example, if a body contains

551 only US-ASCII characters, it SHOULD be marked as being in the US-

552 ASCII character set, not ISO-8859-1, which, like all the ISO-8859

553 family of character sets, is a superset of US-ASCII. More generally,

554 if a widely-used character set is a subset of another character set,

555 and a body contains only characters in the widely-used subset, it

556 should be labelled as being in that subset. This will increase the

557 chances that the recipient will be able to view the resulting entity

558 correctly.

559

560

561

562Freed & Borenstein Standards Track [Page 10]

563

564RFC 2046 Media Types November 1996

565

566

5674.1.3. Plain Subtype

568

569 The simplest and most important subtype of "text" is "plain". This

570 indicates plain text that does not contain any formatting commands or

571 directives. Plain text is intended to be displayed "as-is", that is,

572 no interpretation of embedded formatting commands, font attribute

573 specifications, processing instructions, interpretation directives,

574 or content markup should be necessary for proper display. The

575 default media type of "text/plain; charset=us-ascii" for Internet

576 mail describes existing Internet practice. That is, it is the type

577 of body defined by RFC 822.

578

579 No other "text" subtype is defined by this document.

580

5814.1.4. Unrecognized Subtypes

582

583 Unrecognized subtypes of "text" should be treated as subtype "plain"

584 as long as the MIME implementation knows how to handle the charset.

585 Unrecognized subtypes which also specify an unrecognized charset

586 should be treated as "application/octet- stream".

587

5884.2. Image Media Type

589

590 A media type of "image" indicates that the body contains an image.

591 The subtype names the specific image format. These names are not

592 case sensitive. An initial subtype is "jpeg" for the JPEG format

593 using JFIF encoding [JPEG].

594

595 The list of "image" subtypes given here is neither exclusive nor

596 exhaustive, and is expected to grow as more types are registered with

597 IANA, as described in RFC 2048.

598

599 Unrecognized subtypes of "image" should at a miniumum be treated as

600 "application/octet-stream". Implementations may optionally elect to

601 pass subtypes of "image" that they do not specifically recognize to a

602 secure and robust general-purpose image viewing application, if such

603 an application is available.

604

605 NOTE: Using of a generic-purpose image viewing application this way

606 inherits the security problems of the most dangerous type supported

607 by the application.

608

6094.3. Audio Media Type

610

611 A media type of "audio" indicates that the body contains audio data.

612 Although there is not yet a consensus on an "ideal" audio format for

613 use with computers, there is a pressing need for a format capable of

614 providing interoperable behavior.

615

616

617

618Freed & Borenstein Standards Track [Page 11]

619

620RFC 2046 Media Types November 1996

621

622

623 The initial subtype of "basic" is specified to meet this requirement

624 by providing an absolutely minimal lowest common denominator audio

625 format. It is expected that richer formats for higher quality and/or

626 lower bandwidth audio will be defined by a later document.

627

628 The content of the "audio/basic" subtype is single channel audio

629 encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz.

630

631 Unrecognized subtypes of "audio" should at a miniumum be treated as

632 "application/octet-stream". Implementations may optionally elect to

633 pass subtypes of "audio" that they do not specifically recognize to a

634 robust general-purpose audio playing application, if such an

635 application is available.

636

6374.4. Video Media Type

638

639 A media type of "video" indicates that the body contains a time-

640 varying-picture image, possibly with color and coordinated sound.

641 The term 'video' is used in its most generic sense, rather than with

642 reference to any particular technology or format, and is not meant to

643 preclude subtypes such as animated drawings encoded compactly. The

644 subtype "mpeg" refers to video coded according to the MPEG standard

645 [MPEG].

646

647 Note that although in general this document strongly discourages the

648 mixing of multiple media in a single body, it is recognized that many

649 so-called video formats include a representation for synchronized

650 audio, and this is explicitly permitted for subtypes of "video".

651

652 Unrecognized subtypes of "video" should at a minumum be treated as

653 "application/octet-stream". Implementations may optionally elect to

654 pass subtypes of "video" that they do not specifically recognize to a

655 robust general-purpose video display application, if such an

656 application is available.

657

6584.5. Application Media Type

659

660 The "application" media type is to be used for discrete data which do

661 not fit in any of the other categories, and particularly for data to

662 be processed by some type of application program. This is

663 information which must be processed by an application before it is

664 viewable or usable by a user. Expected uses for the "application"

665 media type include file transfer, spreadsheets, data for mail-based

666 scheduling systems, and languages for "active" (computational)

667 material. (The latter, in particular, can pose security problems

668 which must be understood by implementors, and are considered in

669 detail in the discussion of the "application/PostScript" media type.)

670

671

672

673

674Freed & Borenstein Standards Track [Page 12]

675

676RFC 2046 Media Types November 1996

677

678

679 For example, a meeting scheduler might define a standard

680 representation for information about proposed meeting dates. An

681 intelligent user agent would use this information to conduct a dialog

682 with the user, and might then send additional material based on that

683 dialog. More generally, there have been several "active" messaging

684 languages developed in which programs in a suitably specialized

685 language are transported to a remote location and automatically run

686 in the recipient's environment.

687

688 Such applications may be defined as subtypes of the "application"

689 media type. This document defines two subtypes:

690

691 octet-stream, and PostScript.

692

693 The subtype of "application" will often be either the name or include

694 part of the name of the application for which the data are intended.

695 This does not mean, however, that any application program name may be

696 used freely as a subtype of "application".

697

6984.5.1. Octet-Stream Subtype

699

700 The "octet-stream" subtype is used to indicate that a body contains

701 arbitrary binary data. The set of currently defined parameters is:

702

703 (1) TYPE -- the general type or category of binary data.

704 This is intended as information for the human recipient

705 rather than for any automatic processing.

706

707 (2) PADDING -- the number of bits of padding that were

708 appended to the bit-stream comprising the actual

709 contents to produce the enclosed 8bit byte-oriented

710 data. This is useful for enclosing a bit-stream in a

711 body when the total number of bits is not a multiple of

712 8.

713

714 Both of these parameters are optional.

715

716 An additional parameter, "CONVERSIONS", was defined in RFC 1341 but

717 has since been removed. RFC 1341 also defined the use of a "NAME"

718 parameter which gave a suggested file name to be used if the data

719 were to be written to a file. This has been deprecated in

720 anticipation of a separate Content-Disposition header field, to be

721 defined in a subsequent RFC.

722

723 The recommended action for an implementation that receives an

724 "application/octet-stream" entity is to simply offer to put the data

725 in a file, with any Content-Transfer-Encoding undone, or perhaps to

726 use it as input to a user-specified process.

727

728

729

730Freed & Borenstein Standards Track [Page 13]

731

732RFC 2046 Media Types November 1996

733

734

735 To reduce the danger of transmitting rogue programs, it is strongly

736 recommended that implementations NOT implement a path-search

737 mechanism whereby an arbitrary program named in the Content-Type

738 parameter (e.g., an "interpreter=" parameter) is found and executed

739 using the message body as input.

740

7414.5.2. PostScript Subtype

742

743 A media type of "application/postscript" indicates a PostScript

744 program. Currently two variants of the PostScript language are

745 allowed; the original level 1 variant is described in [POSTSCRIPT]

746 and the more recent level 2 variant is described in [POSTSCRIPT2].

747

748 PostScript is a registered trademark of Adobe Systems, Inc. Use of

749 the MIME media type "application/postscript" implies recognition of

750 that trademark and all the rights it entails.

751

752 The PostScript language definition provides facilities for internal

753 labelling of the specific language features a given program uses.

754 This labelling, called the PostScript document structuring

755 conventions, or DSC, is very general and provides substantially more

756 information than just the language level. The use of document

757 structuring conventions, while not required, is strongly recommended

758 as an aid to interoperability. Documents which lack proper

759 structuring conventions cannot be tested to see whether or not they

760 will work in a given environment. As such, some systems may assume

761 the worst and refuse to process unstructured documents.

762

763 The execution of general-purpose PostScript interpreters entails

764 serious security risks, and implementors are discouraged from simply

765 sending PostScript bodies to "off- the-shelf" interpreters. While it

766 is usually safe to send PostScript to a printer, where the potential

767 for harm is greatly constrained by typical printer environments,

768 implementors should consider all of the following before they add

769 interactive display of PostScript bodies to their MIME readers.

770

771 The remainder of this section outlines some, though probably not all,

772 of the possible problems with the transport of PostScript entities.

773

774 (1) Dangerous operations in the PostScript language

775 include, but may not be limited to, the PostScript

776 operators "deletefile", "renamefile", "filenameforall",

777 and "file". "File" is only dangerous when applied to

778 something other than standard input or output.

779 Implementations may also define additional nonstandard

780 file operators; these may also pose a threat to

781 security. "Filenameforall", the wildcard file search

782 operator, may appear at first glance to be harmless.

783

784

785

786Freed & Borenstein Standards Track [Page 14]

787

788RFC 2046 Media Types November 1996

789

790

791 Note, however, that this operator has the potential to

792 reveal information about what files the recipient has

793 access to, and this information may itself be

794 sensitive. Message senders should avoid the use of

795 potentially dangerous file operators, since these

796 operators are quite likely to be unavailable in secure

797 PostScript implementations. Message receiving and

798 displaying software should either completely disable

799 all potentially dangerous file operators or take

800 special care not to delegate any special authority to

801 their operation. These operators should be viewed as

802 being done by an outside agency when interpreting

803 PostScript documents. Such disabling and/or checking

804 should be done completely outside of the reach of the

805 PostScript language itself; care should be taken to

806 insure that no method exists for re-enabling full-

807 function versions of these operators.

808

809 (2) The PostScript language provides facilities for exiting

810 the normal interpreter, or server, loop. Changes made

811 in this "outer" environment are customarily retained

812 across documents, and may in some cases be retained

813 semipermanently in nonvolatile memory. The operators

814 associated with exiting the interpreter loop have the

815 potential to interfere with subsequent document

816 processing. As such, their unrestrained use

817 constitutes a threat of service denial. PostScript

818 operators that exit the interpreter loop include, but

819 may not be limited to, the exitserver and startjob

820 operators. Message sending software should not

821 generate PostScript that depends on exiting the

822 interpreter loop to operate, since the ability to exit

823 will probably be unavailable in secure PostScript

824 implementations. Message receiving and displaying

825 software should completely disable the ability to make

826 retained changes to the PostScript environment by

827 eliminating or disabling the "startjob" and

828 "exitserver" operations. If these operations cannot be

829 eliminated or completely disabled the password

830 associated with them should at least be set to a hard-

831 to-guess value.

832

833 (3) PostScript provides operators for setting system-wide

834 and device-specific parameters. These parameter

835 settings may be retained across jobs and may

836 potentially pose a threat to the correct operation of

837 the interpreter. The PostScript operators that set

838 system and device parameters include, but may not be

839

840

841

842Freed & Borenstein Standards Track [Page 15]

843

844RFC 2046 Media Types November 1996

845

846

847 limited to, the "setsystemparams" and "setdevparams"

848 operators. Message sending software should not

849 generate PostScript that depends on the setting of

850 system or device parameters to operate correctly. The

851 ability to set these parameters will probably be

852 unavailable in secure PostScript implementations.

853 Message receiving and displaying software should

854 disable the ability to change system and device

855 parameters. If these operators cannot be completely

856 disabled the password associated with them should at

857 least be set to a hard-to-guess value.

858

859 (4) Some PostScript implementations provide nonstandard

860 facilities for the direct loading and execution of

861 machine code. Such facilities are quite obviously open

862 to substantial abuse. Message sending software should

863 not make use of such features. Besides being totally

864 hardware-specific, they are also likely to be

865 unavailable in secure implementations of PostScript.

866 Message receiving and displaying software should not

867 allow such operators to be used if they exist.

868

869 (5) PostScript is an extensible language, and many, if not

870 most, implementations of it provide a number of their

871 own extensions. This document does not deal with such

872 extensions explicitly since they constitute an unknown

873 factor. Message sending software should not make use

874 of nonstandard extensions; they are likely to be

875 missing from some implementations. Message receiving

876 and displaying software should make sure that any

877 nonstandard PostScript operators are secure and don't

878 present any kind of threat.

879

880 (6) It is possible to write PostScript that consumes huge

881 amounts of various system resources. It is also

882 possible to write PostScript programs that loop

883 indefinitely. Both types of programs have the

884 potential to cause damage if sent to unsuspecting

885 recipients. Message-sending software should avoid the

886 construction and dissemination of such programs, which

887 is antisocial. Message receiving and displaying

888 software should provide appropriate mechanisms to abort

889 processing after a reasonable amount of time has

890 elapsed. In addition, PostScript interpreters should be

891 limited to the consumption of only a reasonable amount

892 of any given system resource.

893

894

895

896

897

898Freed & Borenstein Standards Track [Page 16]

899

900RFC 2046 Media Types November 1996

901

902

903 (7) It is possible to include raw binary information inside

904 PostScript in various forms. This is not recommended

905 for use in Internet mail, both because it is not

906 supported by all PostScript interpreters and because it

907 significantly complicates the use of a MIME Content-

908 Transfer-Encoding. (Without such binary, PostScript

909 may typically be viewed as line-oriented data. The

910 treatment of CRLF sequences becomes extremely

911 problematic if binary and line-oriented data are mixed

912 in a single Postscript data stream.)

913

914 (8) Finally, bugs may exist in some PostScript interpreters

915 which could possibly be exploited to gain unauthorized

916 access to a recipient's system. Apart from noting this

917 possibility, there is no specific action to take to

918 prevent this, apart from the timely correction of such

919 bugs if any are found.

920

9214.5.3. Other Application Subtypes

922

923 It is expected that many other subtypes of "application" will be

924 defined in the future. MIME implementations must at a minimum treat

925 any unrecognized subtypes as being equivalent to "application/octet-

926 stream".

927

9285. Composite Media Type Values

929

930 The remaining two of the seven initial Content-Type values refer to

931 composite entities. Composite entities are handled using MIME

932 mechanisms -- a MIME processor typically handles the body directly.

933

9345.1. Multipart Media Type

935

936 In the case of multipart entities, in which one or more different

937 sets of data are combined in a single body, a "multipart" media type

938 field must appear in the entity's header. The body must then contain

939 one or more body parts, each preceded by a boundary delimiter line,

940 and the last one followed by a closing boundary delimiter line.

941 After its boundary delimiter line, each body part then consists of a

942 header area, a blank line, and a body area. Thus a body part is

943 similar to an RFC 822 message in syntax, but different in meaning.

944

945 A body part is an entity and hence is NOT to be interpreted as

946 actually being an RFC 822 message. To begin with, NO header fields

947 are actually required in body parts. A body part that starts with a

948 blank line, therefore, is allowed and is a body part for which all

949 default values are to be assumed. In such a case, the absence of a

950 Content-Type header usually indicates that the corresponding body has

951

952

953

954Freed & Borenstein Standards Track [Page 17]

955

956RFC 2046 Media Types November 1996

957

958

959 a content-type of "text/plain; charset=US-ASCII".

960

961 The only header fields that have defined meaning for body parts are

962 those the names of which begin with "Content-". All other header

963 fields may be ignored in body parts. Although they should generally

964 be retained if at all possible, they may be discarded by gateways if

965 necessary. Such other fields are permitted to appear in body parts

966 but must not be depended on. "X-" fields may be created for

967 experimental or private purposes, with the recognition that the

968 information they contain may be lost at some gateways.

969

970 NOTE: The distinction between an RFC 822 message and a body part is

971 subtle, but important. A gateway between Internet and X.400 mail,

972 for example, must be able to tell the difference between a body part

973 that contains an image and a body part that contains an encapsulated

974 message, the body of which is a JPEG image. In order to represent

975 the latter, the body part must have "Content-Type: message/rfc822",

976 and its body (after the blank line) must be the encapsulated message,

977 with its own "Content-Type: image/jpeg" header field. The use of

978 similar syntax facilitates the conversion of messages to body parts,

979 and vice versa, but the distinction between the two must be

980 understood by implementors. (For the special case in which parts

981 actually are messages, a "digest" subtype is also defined.)

982

983 As stated previously, each body part is preceded by a boundary

984 delimiter line that contains the boundary delimiter. The boundary

985 delimiter MUST NOT appear inside any of the encapsulated parts, on a

986 line by itself or as the prefix of any line. This implies that it is

987 crucial that the composing agent be able to choose and specify a

988 unique boundary parameter value that does not contain the boundary

989 parameter value of an enclosing multipart as a prefix.

990

991 All present and future subtypes of the "multipart" type must use an

992 identical syntax. Subtypes may differ in their semantics, and may

993 impose additional restrictions on syntax, but must conform to the

994 required syntax for the "multipart" type. This requirement ensures

995 that all conformant user agents will at least be able to recognize

996 and separate the parts of any multipart entity, even those of an

997 unrecognized subtype.

998

999 As stated in the definition of the Content-Transfer-Encoding field

1000 [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is

1001 permitted for entities of type "multipart". The "multipart" boundary

1002 delimiters and header fields are always represented as 7bit US-ASCII

1003 in any case (though the header fields may encode non-US-ASCII header

1004 text as per RFC 2047) and data within the body parts can be encoded

1005 on a part-by-part basis, with Content-Transfer-Encoding fields for

1006 each appropriate body part.

1007

1008

1009

1010Freed & Borenstein Standards Track [Page 18]

1011

1012RFC 2046 Media Types November 1996

1013

1014

10155.1.1. Common Syntax

1016

1017 This section defines a common syntax for subtypes of "multipart".

1018 All subtypes of "multipart" must use this syntax. A simple example

1019 of a multipart message also appears in this section. An example of a

1020 more complex multipart message is given in RFC 2049.

1021

1022 The Content-Type field for multipart entities requires one parameter,

1023 "boundary". The boundary delimiter line is then defined as a line

1024 consisting entirely of two hyphen characters ("-", decimal value 45)

1025 followed by the boundary parameter value from the Content-Type header

1026 field, optional linear whitespace, and a terminating CRLF.

1027

1028 NOTE: The hyphens are for rough compatibility with the earlier RFC

1029 934 method of message encapsulation, and for ease of searching for

1030 the boundaries in some implementations. However, it should be noted

1031 that multipart messages are NOT completely compatible with RFC 934

1032 encapsulations; in particular, they do not obey RFC 934 quoting

1033 conventions for embedded lines that begin with hyphens. This

1034 mechanism was chosen over the RFC 934 mechanism because the latter

1035 causes lines to grow with each level of quoting. The combination of

1036 this growth with the fact that SMTP implementations sometimes wrap

1037 long lines made the RFC 934 mechanism unsuitable for use in the event

1038 that deeply-nested multipart structuring is ever desired.

1039

1040 WARNING TO IMPLEMENTORS: The grammar for parameters on the Content-

1041 type field is such that it is often necessary to enclose the boundary

1042 parameter values in quotes on the Content-type line. This is not

1043 always necessary, but never hurts. Implementors should be sure to

1044 study the grammar carefully in order to avoid producing invalid

1045 Content-type fields. Thus, a typical "multipart" Content-Type header

1046 field might look like this:

1047

1048 Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p

1049

1050 But the following is not valid:

1051

1052 Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p

1053

1054 (because of the colon) and must instead be represented as

1055

1056 Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p"

1057

1058 This Content-Type value indicates that the content consists of one or

1059 more parts, each with a structure that is syntactically identical to

1060 an RFC 822 message, except that the header area is allowed to be

1061 completely empty, and that the parts are each preceded by the line

1066Freed & Borenstein Standards Track [Page 19]

1067

1068RFC 2046 Media Types November 1996

1069

1070

1071 --gc0pJq0M:08jU534c0p

1072

1073 The boundary delimiter MUST occur at the beginning of a line, i.e.,

1074 following a CRLF, and the initial CRLF is considered to be attached

1075 to the boundary delimiter line rather than part of the preceding

1076 part. The boundary may be followed by zero or more characters of

1077 linear whitespace. It is then terminated by either another CRLF and

1078 the header fields for the next part, or by two CRLFs, in which case

1079 there are no header fields for the next part. If no Content-Type

1080 field is present it is assumed to be "message/rfc822" in a

1081 "multipart/digest" and "text/plain" otherwise.

1082

1083 NOTE: The CRLF preceding the boundary delimiter line is conceptually

1084 attached to the boundary so that it is possible to have a part that

1085 does not end with a CRLF (line break). Body parts that must be

1086 considered to end with line breaks, therefore, must have two CRLFs

1087 preceding the boundary delimiter line, the first of which is part of

1088 the preceding body part, and the second of which is part of the

1089 encapsulation boundary.

1090

1091 Boundary delimiters must not appear within the encapsulated material,

1092 and must be no longer than 70 characters, not counting the two

1093 leading hyphens.

1094

1095 The boundary delimiter line following the last body part is a

1096 distinguished delimiter that indicates that no further body parts

1097 will follow. Such a delimiter line is identical to the previous

1098 delimiter lines, with the addition of two more hyphens after the

1099 boundary parameter value.

1100

1101 --gc0pJq0M:08jU534c0p--

1102

1103 NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the ../message/part.go:369

1104 boundary value with the beginning of each candidate line. An exact

1105 match of the entire candidate line is not required; it is sufficient

1106 that the boundary appear in its entirety following the CRLF.

1107

1108 There appears to be room for additional information prior to the

1109 first boundary delimiter line and following the final boundary

1110 delimiter line. These areas should generally be left blank, and

1111 implementations must ignore anything that appears before the first

1112 boundary delimiter line or after the last one.

1113

1114 NOTE: These "preamble" and "epilogue" areas are generally not used

1115 because of the lack of proper typing of these parts and the lack of

1116 clear semantics for handling these areas at gateways, particularly

1117 X.400 gateways. However, rather than leaving the preamble area

1118 blank, many MIME implementations have found this to be a convenient

1119

1120

1121

1122Freed & Borenstein Standards Track [Page 20]

1123

1124RFC 2046 Media Types November 1996

1125

1126

1127 place to insert an explanatory note for recipients who read the

1128 message with pre-MIME software, since such notes will be ignored by

1129 MIME-compliant software.

1130

1131 NOTE: Because boundary delimiters must not appear in the body parts

1132 being encapsulated, a user agent must exercise care to choose a

1133 unique boundary parameter value. The boundary parameter value in the

1134 example above could have been the result of an algorithm designed to

1135 produce boundary delimiters with a very low probability of already

1136 existing in the data to be encapsulated without having to prescan the

1137 data. Alternate algorithms might result in more "readable" boundary

1138 delimiters for a recipient with an old user agent, but would require

1139 more attention to the possibility that the boundary delimiter might

1140 appear at the beginning of some line in the encapsulated part. The

1141 simplest boundary delimiter line possible is something like "---",

1142 with a closing boundary delimiter line of "-----".

1143

1144 As a very simple example, the following multipart message has two

1145 parts, both of them plain text, one of them explicitly typed and one

1146 of them implicitly typed:

1147

1148 From: Nathaniel Borenstein <nsb@bellcore.com> ../message/part_test.go:238

1149 To: Ned Freed <ned@innosoft.com>

1150 Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST)

1151 Subject: Sample message

1152 MIME-Version: 1.0

1153 Content-type: multipart/mixed; boundary="simple boundary"

1154

1155 This is the preamble. It is to be ignored, though it

1156 is a handy place for composition agents to include an

1157 explanatory note to non-MIME conformant readers.

1158

1159 --simple boundary

1160

1161 This is implicitly typed plain US-ASCII text.

1162 It does NOT end with a linebreak.

1163 --simple boundary

1164 Content-type: text/plain; charset=us-ascii

1165

1166 This is explicitly typed plain US-ASCII text.

1167 It DOES end with a linebreak.

1168

1169 --simple boundary--

1170

1171 This is the epilogue. It is also to be ignored.

1178Freed & Borenstein Standards Track [Page 21]

1179

1180RFC 2046 Media Types November 1996

1181

1182

1183 The use of a media type of "multipart" in a body part within another

1184 "multipart" entity is explicitly allowed. In such cases, for obvious

1185 reasons, care must be taken to ensure that each nested "multipart"

1186 entity uses a different boundary delimiter. See RFC 2049 for an

1187 example of nested "multipart" entities.

1188

1189 The use of the "multipart" media type with only a single body part

1190 may be useful in certain contexts, and is explicitly permitted.

1191

1192 NOTE: Experience has shown that a "multipart" media type with a

1193 single body part is useful for sending non-text media types. It has

1194 the advantage of providing the preamble as a place to include

1195 decoding instructions. In addition, a number of SMTP gateways move

1196 or remove the MIME headers, and a clever MIME decoder can take a good

1197 guess at multipart boundaries even in the absence of the Content-Type

1198 header and thereby successfully decode the message.

1199

1200 The only mandatory global parameter for the "multipart" media type is

1201 the boundary parameter, which consists of 1 to 70 characters from a

1202 set of characters known to be very robust through mail gateways, and

1203 NOT ending with white space. (If a boundary delimiter line appears to

1204 end with white space, the white space must be presumed to have been

1205 added by a gateway, and must be deleted.) It is formally specified

1206 by the following BNF:

1207

1208 boundary := 0*69<bchars> bcharsnospace

1209

1210 bchars := bcharsnospace / " "

1211

1212 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /

1213 "+" / "_" / "," / "-" / "." /

1214 "/" / ":" / "=" / "?"

1215

1216 Overall, the body of a "multipart" entity may be specified as

1217 follows:

1218

1219 dash-boundary := "--" boundary

1220 ; boundary taken from the value of

1221 ; boundary parameter of the

1222 ; Content-Type field.

1223

1224 multipart-body := [preamble CRLF]

1225 dash-boundary transport-padding CRLF

1226 body-part *encapsulation

1227 close-delimiter transport-padding

1228 [CRLF epilogue]

1234Freed & Borenstein Standards Track [Page 22]

1235

1236RFC 2046 Media Types November 1996

1237

1238

1239 transport-padding := *LWSP-char

1240 ; Composers MUST NOT generate

1241 ; non-zero length transport

1242 ; padding, but receivers MUST

1243 ; be able to handle padding

1244 ; added by message transports.

1245

1246 encapsulation := delimiter transport-padding

1247 CRLF body-part

1248

1249 delimiter := CRLF dash-boundary

1250

1251 close-delimiter := delimiter "--"

1252

1253 preamble := discard-text

1254

1255 epilogue := discard-text

1256

1257 discard-text := *(*text CRLF) *text

1258 ; May be ignored or discarded.

1259

1260 body-part := MIME-part-headers [CRLF *OCTET]

1261 ; Lines in a body-part must not start

1262 ; with the specified dash-boundary and

1263 ; the delimiter must not appear anywhere

1264 ; in the body part. Note that the

1265 ; semantics of a body-part differ from

1266 ; the semantics of a message, as

1267 ; described in the text.

1268

1269 OCTET := <any 0-255 octet value>

1270

1271 IMPORTANT: The free insertion of linear-white-space and RFC 822

1272 comments between the elements shown in this BNF is NOT allowed since

1273 this BNF does not specify a structured header field.

1274

1275 NOTE: In certain transport enclaves, RFC 822 restrictions such as

1276 the one that limits bodies to printable US-ASCII characters may not

1277 be in force. (That is, the transport domains may exist that resemble

1278 standard Internet mail transport as specified in RFC 821 and assumed

1279 by RFC 822, but without certain restrictions.) The relaxation of

1280 these restrictions should be construed as locally extending the

1281 definition of bodies, for example to include octets outside of the

1282 US-ASCII range, as long as these extensions are supported by the

1283 transport and adequately documented in the Content- Transfer-Encoding

1284 header field. However, in no event are headers (either message

1285 headers or body part headers) allowed to contain anything other than

1286 US-ASCII characters.

1287

1288

1289

1290Freed & Borenstein Standards Track [Page 23]

1291

1292RFC 2046 Media Types November 1996

1293

1294

1295 NOTE: Conspicuously missing from the "multipart" type is a notion of

1296 structured, related body parts. It is recommended that those wishing

1297 to provide more structured or integrated multipart messaging

1298 facilities should define subtypes of multipart that are syntactically

1299 identical but define relationships between the various parts. For

1300 example, subtypes of multipart could be defined that include a

1301 distinguished part which in turn is used to specify the relationships

1302 between the other parts, probably referring to them by their

1303 Content-ID field. Old implementations will not recognize the new

1304 subtype if this approach is used, but will treat it as

1305 multipart/mixed and will thus be able to show the user the parts that

1306 are recognized.

1307

13085.1.2. Handling Nested Messages and Multiparts

1309

1310 The "message/rfc822" subtype defined in a subsequent section of this

1311 document has no terminating condition other than running out of data.

1312 Similarly, an improperly truncated "multipart" entity may not have

1313 any terminating boundary marker, and can turn up operationally due to

1314 mail system malfunctions.

1315

1316 It is essential that such entities be handled correctly when they are

1317 themselves imbedded inside of another "multipart" structure. MIME

1318 implementations are therefore required to recognize outer level

1319 boundary markers at ANY level of inner nesting. It is not sufficient

1320 to only check for the next expected marker or other terminating

1321 condition.

1322

13235.1.3. Mixed Subtype

1324

1325 The "mixed" subtype of "multipart" is intended for use when the body

1326 parts are independent and need to be bundled in a particular order.

1327 Any "multipart" subtypes that an implementation does not recognize

1328 must be treated as being of subtype "mixed".

1329

13305.1.4. Alternative Subtype

1331

1332 The "multipart/alternative" type is syntactically identical to

1333 "multipart/mixed", but the semantics are different. In particular,

1334 each of the body parts is an "alternative" version of the same

1335 information.

1336

1337 Systems should recognize that the content of the various parts are

1338 interchangeable. Systems should choose the "best" type based on the

1339 local environment and references, in some cases even through user

1340 interaction. As with "multipart/mixed", the order of body parts is

1341 significant. In this case, the alternatives appear in an order of

1342 increasing faithfulness to the original content. In general, the

1343

1344

1345

1346Freed & Borenstein Standards Track [Page 24]

1347

1348RFC 2046 Media Types November 1996

1349

1350

1351 best choice is the LAST part of a type supported by the recipient

1352 system's local environment.

1353

1354 "Multipart/alternative" may be used, for example, to send a message

1355 in a fancy text format in such a way that it can easily be displayed

1356 anywhere:

1357

1358 From: Nathaniel Borenstein <nsb@bellcore.com>

1359 To: Ned Freed <ned@innosoft.com>

1360 Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)

1361 Subject: Formatted text mail

1362 MIME-Version: 1.0

1363 Content-Type: multipart/alternative; boundary=boundary42

1364

1365 --boundary42

1366 Content-Type: text/plain; charset=us-ascii

1367

1368 ... plain text version of message goes here ...

1369

1370 --boundary42

1371 Content-Type: text/enriched

1372

1373 ... RFC 1896 text/enriched version of same message

1374 goes here ...

1375

1376 --boundary42

1377 Content-Type: application/x-whatever

1378

1379 ... fanciest version of same message goes here ...

1380

1381 --boundary42--

1382

1383 In this example, users whose mail systems understood the

1384 "application/x-whatever" format would see only the fancy version,

1385 while other users would see only the enriched or plain text version,

1386 depending on the capabilities of their system.

1387

1388 In general, user agents that compose "multipart/alternative" entities

1389 must place the body parts in increasing order of preference, that is,

1390 with the preferred format last. For fancy text, the sending user

1391 agent should put the plainest format first and the richest format

1392 last. Receiving user agents should pick and display the last format

1393 they are capable of displaying. In the case where one of the

1394 alternatives is itself of type "multipart" and contains unrecognized

1395 sub-parts, the user agent may choose either to show that alternative,

1396 an earlier alternative, or both.

1402Freed & Borenstein Standards Track [Page 25]

1403

1404RFC 2046 Media Types November 1996

1405

1406

1407 NOTE: From an implementor's perspective, it might seem more sensible

1408 to reverse this ordering, and have the plainest alternative last.

1409 However, placing the plainest alternative first is the friendliest

1410 possible option when "multipart/alternative" entities are viewed

1411 using a non-MIME-conformant viewer. While this approach does impose

1412 some burden on conformant MIME viewers, interoperability with older

1413 mail readers was deemed to be more important in this case.

1414

1415 It may be the case that some user agents, if they can recognize more

1416 than one of the formats, will prefer to offer the user the choice of

1417 which format to view. This makes sense, for example, if a message

1418 includes both a nicely- formatted image version and an easily-edited

1419 text version. What is most critical, however, is that the user not

1420 automatically be shown multiple versions of the same data. Either

1421 the user should be shown the last recognized version or should be

1422 given the choice.

1423

1424 THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each part of a

1425 "multipart/alternative" entity represents the same data, but the

1426 mappings between the two are not necessarily without information

1427 loss. For example, information is lost when translating ODA to

1428 PostScript or plain text. It is recommended that each part should

1429 have a different Content-ID value in the case where the information

1430 content of the two parts is not identical. And when the information

1431 content is identical -- for example, where several parts of type

1432 "message/external-body" specify alternate ways to access the

1433 identical data -- the same Content-ID field value should be used, to

1434 optimize any caching mechanisms that might be present on the

1435 recipient's end. However, the Content-ID values used by the parts

1436 should NOT be the same Content-ID value that describes the

1437 "multipart/alternative" as a whole, if there is any such Content-ID

1438 field. That is, one Content-ID value will refer to the

1439 "multipart/alternative" entity, while one or more other Content-ID

1440 values will refer to the parts inside it.

1441

14425.1.5. Digest Subtype

1443

1444 This document defines a "digest" subtype of the "multipart" Content-

1445 Type. This type is syntactically identical to "multipart/mixed", but

1446 the semantics are different. In particular, in a digest, the default

1447 Content-Type value for a body part is changed from "text/plain" to

1448 "message/rfc822". This is done to allow a more readable digest

1449 format that is largely compatible (except for the quoting convention)

1450 with RFC 934.

1451

1452 Note: Though it is possible to specify a Content-Type value for a

1453 body part in a digest which is other than "message/rfc822", such as a

1454 "text/plain" part containing a description of the material in the

1455

1456

1457

1458Freed & Borenstein Standards Track [Page 26]

1459

1460RFC 2046 Media Types November 1996

1461

1462

1463 digest, actually doing so is undesireble. The "multipart/digest"

1464 Content-Type is intended to be used to send collections of messages.

1465 If a "text/plain" part is needed, it should be included as a seperate

1466 part of a "multipart/mixed" message.

1467

1468 A digest in this format might, then, look something like this:

1469

1470 From: Moderator-Address

1471 To: Recipient-List

1472 Date: Mon, 22 Mar 1994 13:34:51 +0000

1473 Subject: Internet Digest, volume 42

1474 MIME-Version: 1.0

1475 Content-Type: multipart/mixed;

1476 boundary="---- main boundary ----"

1477

1478 ------ main boundary ----

1479

1480 ...Introductory text or table of contents...

1481

1482 ------ main boundary ----

1483 Content-Type: multipart/digest;

1484 boundary="---- next message ----"

1485

1486 ------ next message ----

1487

1488 From: someone-else

1489 Date: Fri, 26 Mar 1993 11:13:32 +0200

1490 Subject: my opinion

1491

1492 ...body goes here ...

1493

1494 ------ next message ----

1495

1496 From: someone-else-again

1497 Date: Fri, 26 Mar 1993 10:07:13 -0500

1498 Subject: my different opinion

1499

1500 ... another body goes here ...

1501

1502 ------ next message ------

1503

1504 ------ main boundary ------

1505

15065.1.6. Parallel Subtype

1507

1508 This document defines a "parallel" subtype of the "multipart"

1509 Content-Type. This type is syntactically identical to

1510 "multipart/mixed", but the semantics are different. In particular,

1511

1512

1513

1514Freed & Borenstein Standards Track [Page 27]

1515

1516RFC 2046 Media Types November 1996

1517

1518

1519 in a parallel entity, the order of body parts is not significant.

1520

1521 A common presentation of this type is to display all of the parts

1522 simultaneously on hardware and software that are capable of doing so.

1523 However, composing agents should be aware that many mail readers will

1524 lack this capability and will show the parts serially in any event.

1525

15265.1.7. Other Multipart Subtypes

1527

1528 Other "multipart" subtypes are expected in the future. MIME

1529 implementations must in general treat unrecognized subtypes of

1530 "multipart" as being equivalent to "multipart/mixed".

1531

15325.2. Message Media Type

1533

1534 It is frequently desirable, in sending mail, to encapsulate another

1535 mail message. A special media type, "message", is defined to

1536 facilitate this. In particular, the "rfc822" subtype of "message" is

1537 used to encapsulate RFC 822 messages.

1538

1539 NOTE: It has been suggested that subtypes of "message" might be

1540 defined for forwarded or rejected messages. However, forwarded and

1541 rejected messages can be handled as multipart messages in which the

1542 first part contains any control or descriptive information, and a

1543 second part, of type "message/rfc822", is the forwarded or rejected

1544 message. Composing rejection and forwarding messages in this manner

1545 will preserve the type information on the original message and allow

1546 it to be correctly presented to the recipient, and hence is strongly

1547 encouraged.

1548

1549 Subtypes of "message" often impose restrictions on what encodings are

1550 allowed. These restrictions are described in conjunction with each

1551 specific subtype.

1552

1553 Mail gateways, relays, and other mail handling agents are commonly

1554 known to alter the top-level header of an RFC 822 message. In

1555 particular, they frequently add, remove, or reorder header fields.

1556 These operations are explicitly forbidden for the encapsulated

1557 headers embedded in the bodies of messages of type "message."

1558

15595.2.1. RFC822 Subtype

1560

1561 A media type of "message/rfc822" indicates that the body contains an

1562 encapsulated message, with the syntax of an RFC 822 message.

1563 However, unlike top-level RFC 822 messages, the restriction that each

1564 "message/rfc822" body must include a "From", "Date", and at least one

1565 destination header is removed and replaced with the requirement that

1566 at least one of "From", "Subject", or "Date" must be present.

1567

1568

1569

1570Freed & Borenstein Standards Track [Page 28]

1571

1572RFC 2046 Media Types November 1996

1573

1574

1575 It should be noted that, despite the use of the numbers "822", a

1576 "message/rfc822" entity isn't restricted to material in strict

1577 conformance to RFC822, nor are the semantics of "message/rfc822"

1578 objects restricted to the semantics defined in RFC822. More

1579 specifically, a "message/rfc822" message could well be a News article

1580 or a MIME message.

1581

1582 No encoding other than "7bit", "8bit", or "binary" is permitted for

1583 the body of a "message/rfc822" entity. The message header fields are

1584 always US-ASCII in any case, and data within the body can still be

1585 encoded, in which case the Content-Transfer-Encoding header field in

1586 the encapsulated message will reflect this. Non-US-ASCII text in the

1587 headers of an encapsulated message can be specified using the

1588 mechanisms described in RFC 2047.

1589

15905.2.2. Partial Subtype

1591

1592 The "partial" subtype is defined to allow large entities to be

1593 delivered as several separate pieces of mail and automatically

1594 reassembled by a receiving user agent. (The concept is similar to IP

1595 fragmentation and reassembly in the basic Internet Protocols.) This

1596 mechanism can be used when intermediate transport agents limit the

1597 size of individual messages that can be sent. The media type

1598 "message/partial" thus indicates that the body contains a fragment of

1599 a larger entity.

1600

1601 Because data of type "message" may never be encoded in base64 or

1602 quoted-printable, a problem might arise if "message/partial" entities

1603 are constructed in an environment that supports binary or 8bit

1604 transport. The problem is that the binary data would be split into

1605 multiple "message/partial" messages, each of them requiring binary

1606 transport. If such messages were encountered at a gateway into a

1607 7bit transport environment, there would be no way to properly encode

1608 them for the 7bit world, aside from waiting for all of the fragments,

1609 reassembling the inner message, and then encoding the reassembled

1610 data in base64 or quoted-printable. Since it is possible that

1611 different fragments might go through different gateways, even this is

1612 not an acceptable solution. For this reason, it is specified that

1613 entities of type "message/partial" must always have a content-

1614 transfer-encoding of 7bit (the default). In particular, even in

1615 environments that support binary or 8bit transport, the use of a

1616 content- transfer-encoding of "8bit" or "binary" is explicitly

1617 prohibited for MIME entities of type "message/partial". This in turn

1618 implies that the inner message must not use "8bit" or "binary"

1619 encoding.

1626Freed & Borenstein Standards Track [Page 29]

1627

1628RFC 2046 Media Types November 1996

1629

1630

1631 Because some message transfer agents may choose to automatically

1632 fragment large messages, and because such agents may use very

1633 different fragmentation thresholds, it is possible that the pieces of

1634 a partial message, upon reassembly, may prove themselves to comprise

1635 a partial message. This is explicitly permitted.

1636

1637 Three parameters must be specified in the Content-Type field of type

1638 "message/partial": The first, "id", is a unique identifier, as close

1639 to a world-unique identifier as possible, to be used to match the

1640 fragments together. (In general, the identifier is essentially a

1641 message-id; if placed in double quotes, it can be ANY message-id, in

1642 accordance with the BNF for "parameter" given in RFC 2045.) The

1643 second, "number", an integer, is the fragment number, which indicates

1644 where this fragment fits into the sequence of fragments. The third,

1645 "total", another integer, is the total number of fragments. This

1646 third subfield is required on the final fragment, and is optional

1647 (though encouraged) on the earlier fragments. Note also that these

1648 parameters may be given in any order.

1649

1650 Thus, the second piece of a 3-piece message may have either of the

1651 following header fields:

1652

1653 Content-Type: Message/Partial; number=2; total=3;

1654 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"

1655

1656 Content-Type: Message/Partial;

1657 id="oc=jpbe0M2Yt4s@thumper.bellcore.com";

1658 number=2

1659

1660 But the third piece MUST specify the total number of fragments:

1661

1662 Content-Type: Message/Partial; number=3; total=3;

1663 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"

1664

1665 Note that fragment numbering begins with 1, not 0.

1666

1667 When the fragments of an entity broken up in this manner are put

1668 together, the result is always a complete MIME entity, which may have

1669 its own Content-Type header field, and thus may contain any other

1670 data type.

1671

16725.2.2.1. Message Fragmentation and Reassembly

1673

1674 The semantics of a reassembled partial message must be those of the

1675 "inner" message, rather than of a message containing the inner

1676 message. This makes it possible, for example, to send a large audio

1677 message as several partial messages, and still have it appear to the

1678 recipient as a simple audio message rather than as an encapsulated

1679

1680

1681

1682Freed & Borenstein Standards Track [Page 30]

1683

1684RFC 2046 Media Types November 1996

1685

1686

1687 message containing an audio message. That is, the encapsulation of

1688 the message is considered to be "transparent".

1689

1690 When generating and reassembling the pieces of a "message/partial"

1691 message, the headers of the encapsulated message must be merged with

1692 the headers of the enclosing entities. In this process the following

1693 rules must be observed:

1694

1695 (1) Fragmentation agents must split messages at line

1696 boundaries only. This restriction is imposed because

1697 splits at points other than the ends of lines in turn

1698 depends on message transports being able to preserve

1699 the semantics of messages that don't end with a CRLF

1700 sequence. Many transports are incapable of preserving

1701 such semantics.

1702

1703 (2) All of the header fields from the initial enclosing

1704 message, except those that start with "Content-" and

1705 the specific header fields "Subject", "Message-ID",

1706 "Encrypted", and "MIME-Version", must be copied, in

1707 order, to the new message.

1708

1709 (3) The header fields in the enclosed message which start

1710 with "Content-", plus the "Subject", "Message-ID",

1711 "Encrypted", and "MIME-Version" fields, must be

1712 appended, in order, to the header fields of the new

1713 message. Any header fields in the enclosed message

1714 which do not start with "Content-" (except for the

1715 "Subject", "Message-ID", "Encrypted", and "MIME-

1716 Version" fields) will be ignored and dropped.

1717

1718 (4) All of the header fields from the second and any

1719 subsequent enclosing messages are discarded by the

1720 reassembly process.

1721

17225.2.2.2. Fragmentation and Reassembly Example

1723

1724 If an audio message is broken into two pieces, the first piece might

1725 look something like this:

1726

1727 X-Weird-Header-1: Foo

1728 From: Bill@host.com

1729 To: joe@otherhost.com

1730 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)

1731 Subject: Audio mail (part 1 of 2)

1732 Message-ID: <id1@host.com>

1733 MIME-Version: 1.0

1734 Content-type: message/partial; id="ABC@host.com";

1735

1736

1737

1738Freed & Borenstein Standards Track [Page 31]

1739

1740RFC 2046 Media Types November 1996

1741

1742

1743 number=1; total=2

1744

1745 X-Weird-Header-1: Bar

1746 X-Weird-Header-2: Hello

1747 Message-ID: <anotherid@foo.com>

1748 Subject: Audio mail

1749 MIME-Version: 1.0

1750 Content-type: audio/basic

1751 Content-transfer-encoding: base64

1752

1753 ... first half of encoded audio data goes here ...

1754

1755 and the second half might look something like this:

1756

1757 From: Bill@host.com

1758 To: joe@otherhost.com

1759 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)

1760 Subject: Audio mail (part 2 of 2)

1761 MIME-Version: 1.0

1762 Message-ID: <id2@host.com>

1763 Content-type: message/partial;

1764 id="ABC@host.com"; number=2; total=2

1765

1766 ... second half of encoded audio data goes here ...

1767

1768 Then, when the fragmented message is reassembled, the resulting

1769 message to be displayed to the user should look something like this:

1770

1771 X-Weird-Header-1: Foo

1772 From: Bill@host.com

1773 To: joe@otherhost.com

1774 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)

1775 Subject: Audio mail

1776 Message-ID: <anotherid@foo.com>

1777 MIME-Version: 1.0

1778 Content-type: audio/basic

1779 Content-transfer-encoding: base64

1780

1781 ... first half of encoded audio data goes here ...

1782 ... second half of encoded audio data goes here ...

1783

1784 The inclusion of a "References" field in the headers of the second

1785 and subsequent pieces of a fragmented message that references the

1786 Message-Id on the previous piece may be of benefit to mail readers

1787 that understand and track references. However, the generation of

1788 such "References" fields is entirely optional.

1794Freed & Borenstein Standards Track [Page 32]

1795

1796RFC 2046 Media Types November 1996

1797

1798

1799 Finally, it should be noted that the "Encrypted" header field has

1800 been made obsolete by Privacy Enhanced Messaging (PEM) [RFC-1421,

1801 RFC-1422, RFC-1423, RFC-1424], but the rules above are nevertheless

1802 believed to describe the correct way to treat it if it is encountered

1803 in the context of conversion to and from "message/partial" fragments.

1804

18055.2.3. External-Body Subtype

1806

1807 The external-body subtype indicates that the actual body data are not

1808 included, but merely referenced. In this case, the parameters

1809 describe a mechanism for accessing the external data.

1810

1811 When a MIME entity is of type "message/external-body", it consists of

1812 a header, two consecutive CRLFs, and the message header for the

1813 encapsulated message. If another pair of consecutive CRLFs appears,

1814 this of course ends the message header for the encapsulated message.

1815 However, since the encapsulated message's body is itself external, it

1816 does NOT appear in the area that follows. For example, consider the

1817 following message:

1818

1819 Content-type: message/external-body;

1820 access-type=local-file;

1821 name="/u/nsb/Me.jpeg"

1822

1823 Content-type: image/jpeg

1824 Content-ID: <id42@guppylake.bellcore.com>

1825 Content-Transfer-Encoding: binary

1826

1827 THIS IS NOT REALLY THE BODY!

1828

1829 The area at the end, which might be called the "phantom body", is

1830 ignored for most external-body messages. However, it may be used to

1831 contain auxiliary information for some such messages, as indeed it is

1832 when the access-type is "mail- server". The only access-type defined

1833 in this document that uses the phantom body is "mail-server", but

1834 other access-types may be defined in the future in other

1835 specifications that use this area.

1836

1837 The encapsulated headers in ALL "message/external-body" entities MUST

1838 include a Content-ID header field to give a unique identifier by

1839 which to reference the data. This identifier may be used for caching

1840 mechanisms, and for recognizing the receipt of the data when the

1841 access-type is "mail-server".

1842

1843 Note that, as specified here, the tokens that describe external-body

1844 data, such as file names and mail server commands, are required to be

1845 in the US-ASCII character set.

1850Freed & Borenstein Standards Track [Page 33]

1851

1852RFC 2046 Media Types November 1996

1853

1854

1855 If this proves problematic in practice, a new mechanism may be

1856 required as a future extension to MIME, either as newly defined

1857 access-types for "message/external-body" or by some other mechanism.

1858

1859 As with "message/partial", MIME entities of type "message/external-

1860 body" MUST have a content-transfer-encoding of 7bit (the default).

1861 In particular, even in environments that support binary or 8bit

1862 transport, the use of a content- transfer-encoding of "8bit" or

1863 "binary" is explicitly prohibited for entities of type

1864 "message/external-body".

1865

18665.2.3.1. General External-Body Parameters

1867

1868 The parameters that may be used with any "message/external- body"

1869 are:

1870

1871 (1) ACCESS-TYPE -- A word indicating the supported access

1872 mechanism by which the file or data may be obtained.

1873 This word is not case sensitive. Values include, but

1874 are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL-

1875 FILE", and "MAIL-SERVER". Future values, except for

1876 experimental values beginning with "X-", must be

1877 registered with IANA, as described in RFC 2048.

1878 This parameter is unconditionally mandatory and MUST be

1879 present on EVERY "message/external-body".

1880

1881 (2) EXPIRATION -- The date (in the RFC 822 "date-time"

1882 syntax, as extended by RFC 1123 to permit 4 digits in

1883 the year field) after which the existence of the

1884 external data is not guaranteed. This parameter may be

1885 used with ANY access-type and is ALWAYS optional.

1886

1887 (3) SIZE -- The size (in octets) of the data. The intent

1888 of this parameter is to help the recipient decide

1889 whether or not to expend the necessary resources to

1890 retrieve the external data. Note that this describes

1891 the size of the data in its canonical form, that is,

1892 before any Content-Transfer-Encoding has been applied

1893 or after the data have been decoded. This parameter

1894 may be used with ANY access-type and is ALWAYS

1895 optional.

1896

1897 (4) PERMISSION -- A case-insensitive field that indicates

1898 whether or not it is expected that clients might also

1899 attempt to overwrite the data. By default, or if

1900 permission is "read", the assumption is that they are

1901 not, and that if the data is retrieved once, it is

1902 never needed again. If PERMISSION is "read-write",

1903

1904

1905

1906Freed & Borenstein Standards Track [Page 34]

1907

1908RFC 2046 Media Types November 1996

1909

1910

1911 this assumption is invalid, and any local copy must be

1912 considered no more than a cache. "Read" and "Read-

1913 write" are the only defined values of permission. This

1914 parameter may be used with ANY access-type and is

1915 ALWAYS optional.

1916

1917 The precise semantics of the access-types defined here are described

1918 in the sections that follow.

1919

19205.2.3.2. The 'ftp' and 'tftp' Access-Types

1921

1922 An access-type of FTP or TFTP indicates that the message body is

1923 accessible as a file using the FTP [RFC-959] or TFTP [RFC- 783]

1924 protocols, respectively. For these access-types, the following

1925 additional parameters are mandatory:

1926

1927 (1) NAME -- The name of the file that contains the actual

1928 body data.

1929

1930 (2) SITE -- A machine from which the file may be obtained,

1931 using the given protocol. This must be a fully

1932 qualified domain name, not a nickname.

1933

1934 (3) Before any data are retrieved, using FTP, the user will

1935 generally need to be asked to provide a login id and a

1936 password for the machine named by the site parameter.

1937 For security reasons, such an id and password are not

1938 specified as content-type parameters, but must be

1939 obtained from the user.

1940

1941 In addition, the following parameters are optional:

1942

1943 (1) DIRECTORY -- A directory from which the data named by

1944 NAME should be retrieved.

1945

1946 (2) MODE -- A case-insensitive string indicating the mode

1947 to be used when retrieving the information. The valid

1948 values for access-type "TFTP" are "NETASCII", "OCTET",

1949 and "MAIL", as specified by the TFTP protocol [RFC-

1950 783]. The valid values for access-type "FTP" are

1951 "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a

1952 decimal integer, typically 8. These correspond to the

1953 representation types "A" "E" "I" and "L n" as specified

1954 by the FTP protocol [RFC-959]. Note that "BINARY" and

1955 "TENEX" are not valid values for MODE and that "OCTET"

1956 or "IMAGE" or "LOCAL8" should be used instead. IF MODE

1957 is not specified, the default value is "NETASCII" for

1958 TFTP and "ASCII" otherwise.

1959

1960

1961

1962Freed & Borenstein Standards Track [Page 35]

1963

1964RFC 2046 Media Types November 1996

1965

1966

19675.2.3.3. The 'anon-ftp' Access-Type

1968

1969 The "anon-ftp" access-type is identical to the "ftp" access type,

1970 except that the user need not be asked to provide a name and password

1971 for the specified site. Instead, the ftp protocol will be used with

1972 login "anonymous" and a password that corresponds to the user's mail

1973 address.

1974

19755.2.3.4. The 'local-file' Access-Type

1976

1977 An access-type of "local-file" indicates that the actual body is

1978 accessible as a file on the local machine. Two additional parameters

1979 are defined for this access type:

1980

1981 (1) NAME -- The name of the file that contains the actual

1982 body data. This parameter is mandatory for the

1983 "local-file" access-type.

1984

1985 (2) SITE -- A domain specifier for a machine or set of

1986 machines that are known to have access to the data

1987 file. This optional parameter is used to describe the

1988 locality of reference for the data, that is, the site

1989 or sites at which the file is expected to be visible.

1990 Asterisks may be used for wildcard matching to a part

1991 of a domain name, such as "*.bellcore.com", to indicate

1992 a set of machines on which the data should be directly

1993 visible, while a single asterisk may be used to

1994 indicate a file that is expected to be universally

1995 available, e.g., via a global file system.

1996

19975.2.3.5. The 'mail-server' Access-Type

1998

1999 The "mail-server" access-type indicates that the actual body is

2000 available from a mail server. Two additional parameters are defined

2001 for this access-type:

2002

2003 (1) SERVER -- The addr-spec of the mail server from which

2004 the actual body data can be obtained. This parameter

2005 is mandatory for the "mail-server" access-type.

2006

2007 (2) SUBJECT -- The subject that is to be used in the mail

2008 that is sent to obtain the data. Note that keying mail

2009 servers on Subject lines is NOT recommended, but such

2010 mail servers are known to exist. This is an optional

2011 parameter.

2018Freed & Borenstein Standards Track [Page 36]

2019

2020RFC 2046 Media Types November 1996

2021

2022

2023 Because mail servers accept a variety of syntaxes, some of which is

2024 multiline, the full command to be sent to a mail server is not

2025 included as a parameter in the content-type header field. Instead,

2026 it is provided as the "phantom body" when the media type is

2027 "message/external-body" and the access-type is mail-server.

2028

2029 Note that MIME does not define a mail server syntax. Rather, it

2030 allows the inclusion of arbitrary mail server commands in the phantom

2031 body. Implementations must include the phantom body in the body of

2032 the message it sends to the mail server address to retrieve the

2033 relevant data.

2034

2035 Unlike other access-types, mail-server access is asynchronous and

2036 will happen at an unpredictable time in the future. For this reason,

2037 it is important that there be a mechanism by which the returned data

2038 can be matched up with the original "message/external-body" entity.

2039 MIME mail servers must use the same Content-ID field on the returned

2040 message that was used in the original "message/external-body"

2041 entities, to facilitate such matching.

2042

20435.2.3.6. External-Body Security Issues

2044

2045 "Message/external-body" entities give rise to two important security

2046 issues:

2047

2048 (1) Accessing data via a "message/external-body" reference

2049 effectively results in the message recipient performing

2050 an operation that was specified by the message

2051 originator. It is therefore possible for the message

2052 originator to trick a recipient into doing something

2053 they would not have done otherwise. For example, an

2054 originator could specify a action that attempts

2055 retrieval of material that the recipient is not

2056 authorized to obtain, causing the recipient to

2057 unwittingly violate some security policy. For this

2058 reason, user agents capable of resolving external

2059 references must always take steps to describe the

2060 action they are to take to the recipient and ask for

2061 explicit permisssion prior to performing it.

2062

2063 The 'mail-server' access-type is particularly

2064 vulnerable, in that it causes the recipient to send a

2065 new message whose contents are specified by the

2066 original message's originator. Given the potential for

2067 abuse, any such request messages that are constructed

2068 should contain a clear indication that they were

2069 generated automatically (e.g. in a Comments: header

2070 field) in an attempt to resolve a MIME

2071

2072

2073

2074Freed & Borenstein Standards Track [Page 37]

2075

2076RFC 2046 Media Types November 1996

2077

2078

2079 "message/external-body" reference.

2080

2081 (2) MIME will sometimes be used in environments that

2082 provide some guarantee of message integrity and

2083 authenticity. If present, such guarantees may apply

2084 only to the actual direct content of messages -- they

2085 may or may not apply to data accessed through MIME's

2086 "message/external-body" mechanism. In particular, it

2087 may be possible to subvert certain access mechanisms

2088 even when the messaging system itself is secure.

2089

2090 It should be noted that this problem exists either with

2091 or without the availabilty of MIME mechanisms. A

2092 casual reference to an FTP site containing a document

2093 in the text of a secure message brings up similar

2094 issues -- the only difference is that MIME provides for

2095 automatic retrieval of such material, and users may

2096 place unwarranted trust is such automatic retrieval

2097 mechanisms.

2098

20995.2.3.7. Examples and Further Explanations

2100

2101 When the external-body mechanism is used in conjunction with the

2102 "multipart/alternative" media type it extends the functionality of

2103 "multipart/alternative" to include the case where the same entity is

2104 provided in the same format but via different accces mechanisms.

2105 When this is done the originator of the message must order the parts

2106 first in terms of preferred formats and then by preferred access

2107 mechanisms. The recipient's viewer should then evaluate the list

2108 both in terms of format and access mechanisms.

2109

2110 With the emerging possibility of very wide-area file systems, it

2111 becomes very hard to know in advance the set of machines where a file

2112 will and will not be accessible directly from the file system.

2113 Therefore it may make sense to provide both a file name, to be tried

2114 directly, and the name of one or more sites from which the file is

2115 known to be accessible. An implementation can try to retrieve remote

2116 files using FTP or any other protocol, using anonymous file retrieval

2117 or prompting the user for the necessary name and password. If an

2118 external body is accessible via multiple mechanisms, the sender may

2119 include multiple entities of type "message/external-body" within the

2120 body parts of an enclosing "multipart/alternative" entity.

2121

2122 However, the external-body mechanism is not intended to be limited to

2123 file retrieval, as shown by the mail-server access-type. Beyond

2124 this, one can imagine, for example, using a video server for external

2125 references to video clips.

2130Freed & Borenstein Standards Track [Page 38]

2131

2132RFC 2046 Media Types November 1996

2133

2134

2135 The embedded message header fields which appear in the body of the

2136 "message/external-body" data must be used to declare the media type

2137 of the external body if it is anything other than plain US-ASCII

2138 text, since the external body does not have a header section to

2139 declare its type. Similarly, any Content-transfer-encoding other

2140 than "7bit" must also be declared here. Thus a complete

2141 "message/external-body" message, referring to an object in PostScript

2142 format, might look like this:

2143

2144 From: Whomever

2145 To: Someone

2146 Date: Whenever

2147 Subject: whatever

2148 MIME-Version: 1.0

2149 Message-ID: <id1@host.com>

2150 Content-Type: multipart/alternative; boundary=42

2151 Content-ID: <id001@guppylake.bellcore.com>

2152

2153 --42

2154 Content-Type: message/external-body; name="BodyFormats.ps";

2155 site="thumper.bellcore.com"; mode="image";

2156 access-type=ANON-FTP; directory="pub";

2157 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

2158

2159 Content-type: application/postscript

2160 Content-ID: <id42@guppylake.bellcore.com>

2161

2162 --42

2163 Content-Type: message/external-body; access-type=local-file;

2164 name="/u/nsb/writing/rfcs/RFC-MIME.ps";

2165 site="thumper.bellcore.com";

2166 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

2167

2168 Content-type: application/postscript

2169 Content-ID: <id42@guppylake.bellcore.com>

2170

2171 --42

2172 Content-Type: message/external-body;

2173 access-type=mail-server

2174 server="listserv@bogus.bitnet";

2175 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

2176

2177 Content-type: application/postscript

2178 Content-ID: <id42@guppylake.bellcore.com>

2179

2180 get RFC-MIME.DOC

2182 --42--

2186Freed & Borenstein Standards Track [Page 39]

2187

2188RFC 2046 Media Types November 1996

2189

2190

2191 Note that in the above examples, the default Content-transfer-

2192 encoding of "7bit" is assumed for the external postscript data.

2193

2194 Like the "message/partial" type, the "message/external-body" media

2195 type is intended to be transparent, that is, to convey the data type

2196 in the external body rather than to convey a message with a body of

2197 that type. Thus the headers on the outer and inner parts must be

2198 merged using the same rules as for "message/partial". In particular,

2199 this means that the Content-type and Subject fields are overridden,

2200 but the From field is preserved.

2201

2202 Note that since the external bodies are not transported along with

2203 the external body reference, they need not conform to transport

2204 limitations that apply to the reference itself. In particular,

2205 Internet mail transports may impose 7bit and line length limits, but

2206 these do not automatically apply to binary external body references.

2207 Thus a Content-Transfer-Encoding is not generally necessary, though

2208 it is permitted.

2209

2210 Note that the body of a message of type "message/external-body" is

2211 governed by the basic syntax for an RFC 822 message. In particular,

2212 anything before the first consecutive pair of CRLFs is header

2213 information, while anything after it is body information, which is

2214 ignored for most access-types.

2215

22165.2.4. Other Message Subtypes

2217

2218 MIME implementations must in general treat unrecognized subtypes of

2219 "message" as being equivalent to "application/octet-stream".

2220

2221 Future subtypes of "message" intended for use with email should be

2222 restricted to "7bit" encoding. A type other than "message" should be

2223 used if restriction to "7bit" is not possible.

2224

22256. Experimental Media Type Values

2226

2227 A media type value beginning with the characters "X-" is a private

2228 value, to be used by consenting systems by mutual agreement. Any

2229 format without a rigorous and public definition must be named with an

2230 "X-" prefix, and publicly specified values shall never begin with

2231 "X-". (Older versions of the widely used Andrew system use the "X-

2232 BE2" name, so new systems should probably choose a different name.)

2233

2234 In general, the use of "X-" top-level types is strongly discouraged.

2235 Implementors should invent subtypes of the existing types whenever

2236 possible. In many cases, a subtype of "application" will be more

2237 appropriate than a new top-level type.

2242Freed & Borenstein Standards Track [Page 40]

2243

2244RFC 2046 Media Types November 1996

2245

2246

22477. Summary

2248

2249 The five discrete media types provide provide a standardized

2250 mechanism for tagging entities as "audio", "image", or several other

2251 kinds of data. The composite "multipart" and "message" media types

2252 allow mixing and hierarchical structuring of entities of different

2253 types in a single message. A distinguished parameter syntax allows

2254 further specification of data format details, particularly the

2255 specification of alternate character sets. Additional optional

2256 header fields provide mechanisms for certain extensions deemed

2257 desirable by many implementors. Finally, a number of useful media

2258 types are defined for general use by consenting user agents, notably

2259 "message/partial" and "message/external-body".

2260

22619. Security Considerations

2262

2263 Security issues are discussed in the context of the

2264 "application/postscript" type, the "message/external-body" type, and

2265 in RFC 2048. Implementors should pay special attention to the

2266 security implications of any media types that can cause the remote

2267 execution of any actions in the recipient's environment. In such

2268 cases, the discussion of the "application/postscript" type may serve

2269 as a model for considering other media types with remote execution

2270 capabilities.

2298Freed & Borenstein Standards Track [Page 41]

2299

2300RFC 2046 Media Types November 1996

2301

2302

23039. Authors' Addresses

2304

2305 For more information, the authors of this document are best contacted

2306 via Internet mail:

2307

2308 Ned Freed

2309 Innosoft International, Inc.

2310 1050 East Garvey Avenue South

2311 West Covina, CA 91790

2312 USA

2313

2314 Phone: +1 818 919 3600

2315 Fax: +1 818 919 3614

2316 EMail: ned@innosoft.com

2317

2318

2319 Nathaniel S. Borenstein

2320 First Virtual Holdings

2321 25 Washington Avenue

2322 Morristown, NJ 07960

2323 USA

2324

2325 Phone: +1 201 540 8967

2326 Fax: +1 201 993 3032

2327 EMail: nsb@nsb.fv.com

2328

2329

2330 MIME is a result of the work of the Internet Engineering Task Force

2331 Working Group on RFC 822 Extensions. The chairman of that group,

2332 Greg Vaudreuil, may be reached at:

2333

2334 Gregory M. Vaudreuil

2335 Octel Network Services

2336 17080 Dallas Parkway

2337 Dallas, TX 75248-1905

2338 USA

2339

2340 EMail: Greg.Vaudreuil@Octel.Com

2354Freed & Borenstein Standards Track [Page 42]

2355

2356RFC 2046 Media Types November 1996

2357

2358

2359Appendix A -- Collected Grammar

2360

2361 This appendix contains the complete BNF grammar for all the syntax

2362 specified by this document.

2363

2364 By itself, however, this grammar is incomplete. It refers by name to

2365 several syntax rules that are defined by RFC 822. Rather than

2366 reproduce those definitions here, and risk unintentional differences

2367 between the two, this document simply refers the reader to RFC 822

2368 for the remaining definitions. Wherever a term is undefined, it

2369 refers to the RFC 822 definition.

2370

2371 boundary := 0*69<bchars> bcharsnospace

2372

2373 bchars := bcharsnospace / " "

2374

2375 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /

2376 "+" / "_" / "," / "-" / "." /

2377 "/" / ":" / "=" / "?"

2378

2379 body-part := <"message" as defined in RFC 822, with all

2380 header fields optional, not starting with the

2381 specified dash-boundary, and with the

2382 delimiter not occurring anywhere in the

2383 body part. Note that the semantics of a

2384 part differ from the semantics of a message,

2385 as described in the text.>

2386

2387 close-delimiter := delimiter "--"

2388

2389 dash-boundary := "--" boundary

2390 ; boundary taken from the value of

2391 ; boundary parameter of the

2392 ; Content-Type field.

2393

2394 delimiter := CRLF dash-boundary

2395

2396 discard-text := *(*text CRLF)

2397 ; May be ignored or discarded.

2398

2399 encapsulation := delimiter transport-padding

2400 CRLF body-part

2401

2402 epilogue := discard-text

2403

2404 multipart-body := [preamble CRLF]

2405 dash-boundary transport-padding CRLF

2406 body-part *encapsulation

2407

2408

2409

2410Freed & Borenstein Standards Track [Page 43]

2411

2412RFC 2046 Media Types November 1996

2413

2414

2415 close-delimiter transport-padding

2416 [CRLF epilogue]

2417

2418 preamble := discard-text

2419

2420 transport-padding := *LWSP-char

2421 ; Composers MUST NOT generate

2422 ; non-zero length transport

2423 ; padding, but receivers MUST

2424 ; be able to handle padding

2425 ; added by message transports.

2466Freed & Borenstein Standards Track [Page 44]

2467

2468