7Network Working Group                                          N. Freed
 
8Request for Comments: 2046                                     Innosoft
 
9Obsoletes: 1521, 1522, 1590                               N. Borenstein
 
10Category: Standards Track                                 First Virtual
 
14                 Multipurpose Internet Mail Extensions
 
20   This document specifies an Internet standards track protocol for the
 
21   Internet community, and requests discussion and suggestions for
 
22   improvements.  Please refer to the current edition of the "Internet
 
23   Official Protocol Standards" (STD 1) for the standardization state
 
24   and status of this protocol.  Distribution of this memo is unlimited.
 
28   STD 11, RFC 822 defines a message representation protocol specifying
 
29   considerable detail about US-ASCII message headers, but which leaves
 
30   the message content, or message body, as flat US-ASCII text.  This
 
31   set of documents, collectively called the Multipurpose Internet Mail
 
32   Extensions, or MIME, redefines the format of messages to allow for
 
34    (1)   textual message bodies in character sets other than
 
37    (2)   an extensible set of different formats for non-textual
 
40    (3)   multi-part message bodies, and
 
42    (4)   textual header information in character sets other than
 
45   These documents are based on earlier work documented in RFC 934, STD
 
46   11, and RFC 1049, but extends and revises them.  Because RFC 822 said
 
47   so little about message bodies, these documents are largely
 
48   orthogonal to (rather than a revision of) RFC 822.
 
50   The initial document in this set, RFC 2045, specifies the various
 
51   headers used to describe the structure of MIME messages. This second
 
52   document defines the general structure of the MIME media typing
 
53   system and defines an initial set of media types. The third document,
 
54   RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text
 
58Freed & Borenstein          Standards Track                     [Page 1]
 
60RFC 2046                      Media Types                  November 1996
 
63   data in Internet mail header fields. The fourth document, RFC 2048,
 
64   specifies various IANA registration procedures for MIME-related
 
65   facilities.  The fifth and final document, RFC 2049, describes MIME
 
66   conformance criteria as well as providing some illustrative examples
 
67   of MIME message formats, acknowledgements, and the bibliography.
 
69   These documents are revisions of RFCs 1521 and 1522, which themselves
 
70   were revisions of RFCs 1341 and 1342.  An appendix in RFC 2049
 
71   describes differences and changes from previous versions.
 
75   1. Introduction .........................................    3
 
76   2. Definition of a Top-Level Media Type .................    4
 
77   3. Overview Of The Initial Top-Level Media Types ........    4
 
78   4. Discrete Media Type Values ...........................    6
 
79   4.1 Text Media Type .....................................    6
 
80   4.1.1 Representation of Line Breaks .....................    7
 
81   4.1.2 Charset Parameter .................................    7
 
82   4.1.3 Plain Subtype .....................................   11
 
83   4.1.4 Unrecognized Subtypes .............................   11
 
84   4.2 Image Media Type ....................................   11
 
85   4.3 Audio Media Type ....................................   11
 
86   4.4 Video Media Type ....................................   12
 
87   4.5 Application Media Type ..............................   12
 
88   4.5.1 Octet-Stream Subtype ..............................   13
 
89   4.5.2 PostScript Subtype ................................   14
 
90   4.5.3 Other Application Subtypes ........................   17
 
91   5. Composite Media Type Values ..........................   17
 
92   5.1 Multipart Media Type ................................   17
 
93   5.1.1 Common Syntax .....................................   19
 
94   5.1.2 Handling Nested Messages and Multiparts ...........   24
 
95   5.1.3 Mixed Subtype .....................................   24
 
96   5.1.4 Alternative Subtype ...............................   24
 
97   5.1.5 Digest Subtype ....................................   26
 
98   5.1.6 Parallel Subtype ..................................   27
 
99   5.1.7 Other Multipart Subtypes ..........................   28
 
100   5.2 Message Media Type ..................................   28
 
101   5.2.1 RFC822 Subtype ....................................   28
 
102   5.2.2 Partial Subtype ...................................   29
 
103   5.2.2.1 Message Fragmentation and Reassembly ............   30
 
104   5.2.2.2 Fragmentation and Reassembly Example ............   31
 
105   5.2.3 External-Body Subtype .............................   33
 
106   5.2.4 Other Message Subtypes ............................   40
 
107   6. Experimental Media Type Values .......................   40
 
108   7. Summary ..............................................   41
 
109   8. Security Considerations ..............................   41
 
110   9. Authors' Addresses ...................................   42
 
114Freed & Borenstein          Standards Track                     [Page 2]
 
116RFC 2046                      Media Types                  November 1996
 
119   A. Collected Grammar ....................................   43
 
123   The first document in this set, RFC 2045, defines a number of header
 
124   fields, including Content-Type. The Content-Type field is used to
 
125   specify the nature of the data in the body of a MIME entity, by
 
126   giving media type and subtype identifiers, and by providing auxiliary
 
127   information that may be required for certain media types.  After the
 
128   type and subtype names, the remainder of the header field is simply a
 
129   set of parameters, specified in an attribute/value notation.  The
 
130   ordering of parameters is not significant.
 
132   In general, the top-level media type is used to declare the general
 
133   type of data, while the subtype specifies a specific format for that
 
134   type of data.  Thus, a media type of "image/xyz" is enough to tell a
 
135   user agent that the data is an image, even if the user agent has no
 
136   knowledge of the specific image format "xyz".  Such information can
 
137   be used, for example, to decide whether or not to show a user the raw
 
138   data from an unrecognized subtype -- such an action might be
 
139   reasonable for unrecognized subtypes of "text", but not for
 
140   unrecognized subtypes of "image" or "audio".  For this reason,
 
141   registered subtypes of "text", "image", "audio", and "video" should
 
142   not contain embedded information that is really of a different type.
 
143   Such compound formats should be represented using the "multipart" or
 
146   Parameters are modifiers of the media subtype, and as such do not
 
147   fundamentally affect the nature of the content.  The set of
 
148   meaningful parameters depends on the media type and subtype.  Most
 
149   parameters are associated with a single specific subtype.  However, a
 
150   given top-level media type may define parameters which are applicable
 
151   to any subtype of that type.  Parameters may be required by their
 
152   defining media type or subtype or they may be optional.  MIME
 
153   implementations must also ignore any parameters whose names they do
 
156   MIME's Content-Type header field and media type mechanism has been
 
157   carefully designed to be extensible, and it is expected that the set
 
158   of media type/subtype pairs and their associated parameters will grow
 
159   significantly over time.  Several other MIME facilities, such as
 
160   transfer encodings and "message/external-body" access types, are
 
161   likely to have new values defined over time.  In order to ensure that
 
162   the set of such values is developed in an orderly, well-specified,
 
163   and public manner, MIME sets up a registration process which uses the
 
164   Internet Assigned Numbers Authority (IANA) as a central registry for
 
165   MIME's various areas of extensibility.  The registration process for
 
166   these areas is described in a companion document, RFC 2048.
 
170Freed & Borenstein          Standards Track                     [Page 3]
 
172RFC 2046                      Media Types                  November 1996
 
175   The initial seven standard top-level media type are defined and
 
176   described in the remainder of this document.
 
1782.  Definition of a Top-Level Media Type
 
180   The definition of a top-level media type consists of:
 
182    (1)   a name and a description of the type, including
 
183          criteria for whether a particular type would qualify
 
186    (2)   the names and definitions of parameters, if any, which
 
187          are defined for all subtypes of that type (including
 
188          whether such parameters are required or optional),
 
190    (3)   how a user agent and/or gateway should handle unknown
 
191          subtypes of this type,
 
193    (4)   general considerations on gatewaying entities of this
 
194          top-level type, if any, and
 
196    (5)   any restrictions on content-transfer-encodings for
 
197          entities of this top-level type.
 
1993.  Overview Of The Initial Top-Level Media Types
 
201   The five discrete top-level media types are:
 
203    (1)   text -- textual information.  The subtype "plain" in
 
204          particular indicates plain text containing no
 
205          formatting commands or directives of any sort. Plain
 
206          text is intended to be displayed "as-is". No special
 
207          software is required to get the full meaning of the
 
208          text, aside from support for the indicated character
 
209          set. Other subtypes are to be used for enriched text in
 
210          forms where application software may enhance the
 
211          appearance of the text, but such software must not be
 
212          required in order to get the general idea of the
 
213          content.  Possible subtypes of "text" thus include any
 
214          word processor format that can be read without
 
215          resorting to software that understands the format.  In
 
216          particular, formats that employ embeddded binary
 
217          formatting information are not considered directly
 
218          readable. A very simple and portable subtype,
 
219          "richtext", was defined in RFC 1341, with a further
 
220          revision in RFC 1896 under the name "enriched".
 
226Freed & Borenstein          Standards Track                     [Page 4]
 
228RFC 2046                      Media Types                  November 1996
 
231    (2)   image -- image data.  "Image" requires a display device
 
232          (such as a graphical display, a graphics printer, or a
 
233          FAX machine) to view the information. An initial
 
234          subtype is defined for the widely-used image format
 
235          JPEG. .  subtypes are defined for two widely-used image
 
236          formats, jpeg and gif.
 
238    (3)   audio -- audio data.  "Audio" requires an audio output
 
239          device (such as a speaker or a telephone) to "display"
 
240          the contents.  An initial subtype "basic" is defined in
 
243    (4)   video -- video data.  "Video" requires the capability
 
244          to display moving images, typically including
 
245          specialized hardware and software.  An initial subtype
 
246          "mpeg" is defined in this document.
 
248    (5)   application -- some other kind of data, typically
 
249          either uninterpreted binary data or information to be
 
250          processed by an application.  The subtype "octet-
 
251          stream" is to be used in the case of uninterpreted
 
252          binary data, in which case the simplest recommended
 
253          action is to offer to write the information into a file
 
254          for the user.  The "PostScript" subtype is also defined
 
255          for the transport of PostScript material.  Other
 
256          expected uses for "application" include spreadsheets,
 
257          data for mail-based scheduling systems, and languages
 
258          for "active" (computational) messaging, and word
 
259          processing formats that are not directly readable.
 
260          Note that security considerations may exist for some
 
261          types of application data, most notably
 
262          "application/PostScript" and any form of active
 
263          messaging.  These issues are discussed later in this
 
266   The two composite top-level media types are:
 
268    (1)   multipart -- data consisting of multiple entities of
 
269          independent data types.  Four subtypes are initially
 
270          defined, including the basic "mixed" subtype specifying
 
271          a generic mixed set of parts, "alternative" for
 
272          representing the same data in multiple formats,
 
273          "parallel" for parts intended to be viewed
 
274          simultaneously, and "digest" for multipart entities in
 
275          which each part has a default type of "message/rfc822".
 
282Freed & Borenstein          Standards Track                     [Page 5]
 
284RFC 2046                      Media Types                  November 1996
 
287    (2)   message -- an encapsulated message.  A body of media
 
288          type "message" is itself all or a portion of some kind
 
289          of message object.  Such objects may or may not in turn
 
290          contain other entities.  The "rfc822" subtype is used
 
291          when the encapsulated content is itself an RFC 822
 
292          message.  The "partial" subtype is defined for partial
 
293          RFC 822 messages, to permit the fragmented transmission
 
294          of bodies that are thought to be too large to be passed
 
295          through transport facilities in one piece.  Another
 
296          subtype, "external-body", is defined for specifying
 
297          large bodies by reference to an external data source.
 
299   It should be noted that the list of media type values given here may
 
300   be augmented in time, via the mechanisms described above, and that
 
301   the set of subtypes is expected to grow substantially.
 
3034.  Discrete Media Type Values
 
305   Five of the seven initial media type values refer to discrete bodies.
 
306   The content of these types must be handled by non-MIME mechanisms;
 
307   they are opaque to MIME processors.
 
311   The "text" media type is intended for sending material which is
 
312   principally textual in form.  A "charset" parameter may be used to
 
313   indicate the character set of the body text for "text" subtypes,
 
314   notably including the subtype "text/plain", which is a generic
 
315   subtype for plain text.  Plain text does not provide for or allow
 
316   formatting commands, font attribute specifications, processing
 
317   instructions, interpretation directives, or content markup.  Plain
 
318   text is seen simply as a linear sequence of characters, possibly
 
319   interrupted by line breaks or page breaks.  Plain text may allow the
 
320   stacking of several characters in the same position in the text.
 
321   Plain text in scripts like Arabic and Hebrew may also include
 
322   facilitites that allow the arbitrary mixing of text segments with
 
323   opposite writing directions.
 
325   Beyond plain text, there are many formats for representing what might
 
326   be known as "rich text".  An interesting characteristic of many such
 
327   representations is that they are to some extent readable even without
 
328   the software that interprets them.  It is useful, then, to
 
329   distinguish them, at the highest level, from such unreadable data as
 
330   images, audio, or text represented in an unreadable form. In the
 
331   absence of appropriate interpretation software, it is reasonable to
 
332   show subtypes of "text" to the user, while it is not reasonable to do
 
333   so with most nontextual data. Such formatted textual data should be
 
334   represented using subtypes of "text".
 
338Freed & Borenstein          Standards Track                     [Page 6]
 
340RFC 2046                      Media Types                  November 1996
 
3434.1.1.  Representation of Line Breaks
 
345   The canonical form of any MIME "text" subtype MUST always represent a
 
346   line break as a CRLF sequence.  Similarly, any occurrence of CRLF in
 
347   MIME "text" MUST represent a line break.  Use of CR and LF outside of
 
348   line break sequences is also forbidden.
 
350   This rule applies regardless of format or character set or sets
 
353   NOTE: The proper interpretation of line breaks when a body is
 
354   displayed depends on the media type. In particular, while it is
 
355   appropriate to treat a line break as a transition to a new line when
 
356   displaying a "text/plain" body, this treatment is actually incorrect
 
357   for other subtypes of "text" like "text/enriched" [RFC-1896].
 
358   Similarly, whether or not line breaks should be added during display
 
359   operations is also a function of the media type. It should not be
 
360   necessary to add any line breaks to display "text/plain" correctly,
 
361   whereas proper display of "text/enriched" requires the appropriate
 
362   addition of line breaks.
 
364   NOTE: Some protocols defines a maximum line length.  E.g. SMTP [RFC-
 
365   821] allows a maximum of 998 octets before the next CRLF sequence.
 
366   To be transported by such protocols, data which includes too long
 
367   segments without CRLF sequences must be encoded with a suitable
 
368   content-transfer-encoding.
 
3704.1.2.  Charset Parameter
 
372   A critical parameter that may be specified in the Content-Type field
 
373   for "text/plain" data is the character set.  This is specified with a
 
374   "charset" parameter, as in:
 
376     Content-type: text/plain; charset=iso-8859-1
 
378   Unlike some other parameter values, the values of the charset
 
379   parameter are NOT case sensitive.  The default character set, which
 
380   must be assumed in the absence of a charset parameter, is US-ASCII.
 
382   The specification for any future subtypes of "text" must specify
 
383   whether or not they will also utilize a "charset" parameter, and may
 
384   possibly restrict its values as well.  For other subtypes of "text"
 
385   than "text/plain", the semantics of the "charset" parameter should be
 
386   defined to be identical to those specified here for "text/plain",
 
387   i.e., the body consists entirely of characters in the given charset.
 
388   In particular, definers of future "text" subtypes should pay close
 
389   attention to the implications of multioctet character sets for their
 
394Freed & Borenstein          Standards Track                     [Page 7]
 
396RFC 2046                      Media Types                  November 1996
 
399   The charset parameter for subtypes of "text" gives a name of a
 
400   character set, as "character set" is defined in RFC 2045.  The rules
 
401   regarding line breaks detailed in the previous section must also be
 
402   observed -- a character set whose definition does not conform to
 
403   these rules cannot be used in a MIME "text" subtype.
 
405   An initial list of predefined character set names can be found at the
 
406   end of this section.  Additional character sets may be registered
 
409   Other media types than subtypes of "text" might choose to employ the
 
410   charset parameter as defined here, but with the CRLF/line break
 
411   restriction removed.  Therefore, all character sets that conform to
 
412   the general definition of "character set" in RFC 2045 can be
 
413   registered for MIME use.
 
415   Note that if the specified character set includes 8-bit characters
 
416   and such characters are used in the body, a Content-Transfer-Encoding
 
417   header field and a corresponding encoding on the data are required in
 
418   order to transmit the body via some mail transfer protocols, such as
 
421   The default character set, US-ASCII, has been the subject of some
 
422   confusion and ambiguity in the past.  Not only were there some
 
423   ambiguities in the definition, there have been wide variations in
 
424   practice.  In order to eliminate such ambiguity and variations in the
 
425   future, it is strongly recommended that new user agents explicitly
 
426   specify a character set as a media type parameter in the Content-Type
 
427   header field. "US-ASCII" does not indicate an arbitrary 7-bit
 
428   character set, but specifies that all octets in the body must be
 
429   interpreted as characters according to the US-ASCII character set.
 
430   National and application-oriented versions of ISO 646 [ISO-646] are
 
431   usually NOT identical to US-ASCII, and in that case their use in
 
432   Internet mail is explicitly discouraged.  The omission of the ISO 646
 
433   character set from this document is deliberate in this regard.  The
 
434   character set name of "US-ASCII" explicitly refers to the character
 
435   set defined in ANSI X3.4-1986 [US- ASCII].  The new international
 
436   reference version (IRV) of the 1991 edition of ISO 646 is identical
 
437   to US-ASCII.  The character set name "ASCII" is reserved and must not
 
438   be used for any purpose.
 
440   NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier
 
441   version of the American Standard.  Insofar as one of the purposes of
 
442   specifying a media type and character set is to permit the receiver
 
443   to unambiguously determine how the sender intended the coded message
 
444   to be interpreted, assuming anything other than "strict ASCII" as the
 
445   default would risk unintentional and incompatible changes to the
 
446   semantics of messages now being transmitted.  This also implies that
 
450Freed & Borenstein          Standards Track                     [Page 8]
 
452RFC 2046                      Media Types                  November 1996
 
455   messages containing characters coded according to other versions of
 
456   ISO 646 than US-ASCII and the 1991 IRV, or using code-switching
 
457   procedures (e.g., those of ISO 2022), as well as 8bit or multiple
 
458   octet character encodings MUST use an appropriate character set
 
459   specification to be consistent with MIME.
 
461   The complete US-ASCII character set is listed in ANSI X3.4- 1986.
 
462   Note that the control characters including DEL (0-31, 127) have no
 
463   defined meaning in apart from the combination CRLF (US-ASCII values
 
464   13 and 10) indicating a new line.  Two of the characters have de
 
465   facto meanings in wide use: FF (12) often means "start subsequent
 
466   text on the beginning of a new page"; and TAB or HT (9) often (though
 
467   not always) means "move the cursor to the next available column after
 
468   the current position where the column number is a multiple of 8
 
469   (counting the first column as column 0)."  Aside from these
 
470   conventions, any use of the control characters or DEL in a body must
 
473    (1)   because a subtype of text other than "plain"
 
474          specifically assigns some additional meaning, or
 
476    (2)   within the context of a private agreement between the
 
477          sender and recipient. Such private agreements are
 
478          discouraged and should be replaced by the other
 
479          capabilities of this document.
 
481   NOTE: An enormous proliferation of character sets exist beyond US-
 
482   ASCII.  A large number of partially or totally overlapping character
 
483   sets is NOT a good thing.  A SINGLE character set that can be used
 
484   universally for representing all of the world's languages in Internet
 
485   mail would be preferrable.  Unfortunately, existing practice in
 
486   several communities seems to point to the continued use of multiple
 
487   character sets in the near future.  A small number of standard
 
488   character sets are, therefore, defined for Internet use in this
 
491   The defined charset values are:
 
493    (1)   US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].
 
495    (2)   ISO-8859-X -- where "X" is to be replaced, as
 
496          necessary, for the parts of ISO-8859 [ISO-8859].  Note
 
497          that the ISO 646 character sets have deliberately been
 
498          omitted in favor of their 8859 replacements, which are
 
499          the designated character sets for Internet mail.  As of
 
500          the publication of this document, the legitimate values
 
501          for "X" are the digits 1 through 10.
 
506Freed & Borenstein          Standards Track                     [Page 9]
 
508RFC 2046                      Media Types                  November 1996
 
511   Characters in the range 128-159 has no assigned meaning in ISO-8859-
 
512   X.  Characters with values below 128 in ISO-8859-X have the same
 
513   assigned meaning as they do in US-ASCII.
 
515   Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew
 
516   alphabet) includes both characters for which the normal writing
 
517   direction is right to left and characters for which it is left to
 
518   right, but do not define a canonical ordering method for representing
 
519   bi-directional text.  The charset values "ISO-8859-6" and "ISO-8859-
 
520   8", however, specify that the visual method is used [RFC-1556].
 
522   All of these character sets are used as pure 7bit or 8bit sets
 
523   without any shift or escape functions.  The meaning of shift and
 
524   escape sequences in these character sets is not defined.
 
526   The character sets specified above are the ones that were relatively
 
527   uncontroversial during the drafting of MIME.  This document does not
 
528   endorse the use of any particular character set other than US-ASCII,
 
529   and recognizes that the future evolution of world character sets
 
532   Note that the character set used, if anything other than US- ASCII,
 
533   must always be explicitly specified in the Content-Type field.
 
535   No character set name other than those defined above may be used in
 
536   Internet mail without the publication of a formal specification and
 
537   its registration with IANA, or by private agreement, in which case
 
538   the character set name must begin with "X-".
 
540   Implementors are discouraged from defining new character sets unless
 
541   absolutely necessary.
 
543   The "charset" parameter has been defined primarily for the purpose of
 
544   textual data, and is described in this section for that reason.
 
545   However, it is conceivable that non-textual data might also wish to
 
546   specify a charset value for some purpose, in which case the same
 
547   syntax and values should be used.
 
549   In general, composition software should always use the "lowest common
 
550   denominator" character set possible.  For example, if a body contains
 
551   only US-ASCII characters, it SHOULD be marked as being in the US-
 
552   ASCII character set, not ISO-8859-1, which, like all the ISO-8859
 
553   family of character sets, is a superset of US-ASCII.  More generally,
 
554   if a widely-used character set is a subset of another character set,
 
555   and a body contains only characters in the widely-used subset, it
 
556   should be labelled as being in that subset.  This will increase the
 
557   chances that the recipient will be able to view the resulting entity
 
562Freed & Borenstein          Standards Track                    [Page 10]
 
564RFC 2046                      Media Types                  November 1996
 
569   The simplest and most important subtype of "text" is "plain".  This
 
570   indicates plain text that does not contain any formatting commands or
 
571   directives. Plain text is intended to be displayed "as-is", that is,
 
572   no interpretation of embedded formatting commands, font attribute
 
573   specifications, processing instructions, interpretation directives,
 
574   or content markup should be necessary for proper display.  The
 
575   default media type of "text/plain; charset=us-ascii" for Internet
 
576   mail describes existing Internet practice.  That is, it is the type
 
577   of body defined by RFC 822.
 
579   No other "text" subtype is defined by this document.
 
5814.1.4.  Unrecognized Subtypes
 
583   Unrecognized subtypes of "text" should be treated as subtype "plain"
 
584   as long as the MIME implementation knows how to handle the charset.
 
585   Unrecognized subtypes which also specify an unrecognized charset
 
586   should be treated as "application/octet- stream".
 
590   A media type of "image" indicates that the body contains an image.
 
591   The subtype names the specific image format.  These names are not
 
592   case sensitive. An initial subtype is "jpeg" for the JPEG format
 
593   using JFIF encoding [JPEG].
 
595   The list of "image" subtypes given here is neither exclusive nor
 
596   exhaustive, and is expected to grow as more types are registered with
 
597   IANA, as described in RFC 2048.
 
599   Unrecognized subtypes of "image" should at a miniumum be treated as
 
600   "application/octet-stream".  Implementations may optionally elect to
 
601   pass subtypes of "image" that they do not specifically recognize to a
 
602   secure and robust general-purpose image viewing application, if such
 
603   an application is available.
 
605   NOTE: Using of a generic-purpose image viewing application this way
 
606   inherits the security problems of the most dangerous type supported
 
611   A media type of "audio" indicates that the body contains audio data.
 
612   Although there is not yet a consensus on an "ideal" audio format for
 
613   use with computers, there is a pressing need for a format capable of
 
614   providing interoperable behavior.
 
618Freed & Borenstein          Standards Track                    [Page 11]
 
620RFC 2046                      Media Types                  November 1996
 
623   The initial subtype of "basic" is specified to meet this requirement
 
624   by providing an absolutely minimal lowest common denominator audio
 
625   format.  It is expected that richer formats for higher quality and/or
 
626   lower bandwidth audio will be defined by a later document.
 
628   The content of the "audio/basic" subtype is single channel audio
 
629   encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz.
 
631   Unrecognized subtypes of "audio" should at a miniumum be treated as
 
632   "application/octet-stream".  Implementations may optionally elect to
 
633   pass subtypes of "audio" that they do not specifically recognize to a
 
634   robust general-purpose audio playing application, if such an
 
635   application is available.
 
639   A media type of "video" indicates that the body contains a time-
 
640   varying-picture image, possibly with color and coordinated sound.
 
641   The term 'video' is used in its most generic sense, rather than with
 
642   reference to any particular technology or format, and is not meant to
 
643   preclude subtypes such as animated drawings encoded compactly.  The
 
644   subtype "mpeg" refers to video coded according to the MPEG standard
 
647   Note that although in general this document strongly discourages the
 
648   mixing of multiple media in a single body, it is recognized that many
 
649   so-called video formats include a representation for synchronized
 
650   audio, and this is explicitly permitted for subtypes of "video".
 
652   Unrecognized subtypes of "video" should at a minumum be treated as
 
653   "application/octet-stream".  Implementations may optionally elect to
 
654   pass subtypes of "video" that they do not specifically recognize to a
 
655   robust general-purpose video display application, if such an
 
656   application is available.
 
6584.5.  Application Media Type
 
660   The "application" media type is to be used for discrete data which do
 
661   not fit in any of the other categories, and particularly for data to
 
662   be processed by some type of application program.  This is
 
663   information which must be processed by an application before it is
 
664   viewable or usable by a user.  Expected uses for the "application"
 
665   media type include file transfer, spreadsheets, data for mail-based
 
666   scheduling systems, and languages for "active" (computational)
 
667   material.  (The latter, in particular, can pose security problems
 
668   which must be understood by implementors, and are considered in
 
669   detail in the discussion of the "application/PostScript" media type.)
 
674Freed & Borenstein          Standards Track                    [Page 12]
 
676RFC 2046                      Media Types                  November 1996
 
679   For example, a meeting scheduler might define a standard
 
680   representation for information about proposed meeting dates.  An
 
681   intelligent user agent would use this information to conduct a dialog
 
682   with the user, and might then send additional material based on that
 
683   dialog.  More generally, there have been several "active" messaging
 
684   languages developed in which programs in a suitably specialized
 
685   language are transported to a remote location and automatically run
 
686   in the recipient's environment.
 
688   Such applications may be defined as subtypes of the "application"
 
689   media type. This document defines two subtypes:
 
691   octet-stream, and PostScript.
 
693   The subtype of "application" will often be either the name or include
 
694   part of the name of the application for which the data are intended.
 
695   This does not mean, however, that any application program name may be
 
696   used freely as a subtype of "application".
 
6984.5.1.  Octet-Stream Subtype
 
700   The "octet-stream" subtype is used to indicate that a body contains
 
701   arbitrary binary data.  The set of currently defined parameters is:
 
703    (1)   TYPE -- the general type or category of binary data.
 
704          This is intended as information for the human recipient
 
705          rather than for any automatic processing.
 
707    (2)   PADDING -- the number of bits of padding that were
 
708          appended to the bit-stream comprising the actual
 
709          contents to produce the enclosed 8bit byte-oriented
 
710          data.  This is useful for enclosing a bit-stream in a
 
711          body when the total number of bits is not a multiple of
 
714   Both of these parameters are optional.
 
716   An additional parameter, "CONVERSIONS", was defined in RFC 1341 but
 
717   has since been removed.  RFC 1341 also defined the use of a "NAME"
 
718   parameter which gave a suggested file name to be used if the data
 
719   were to be written to a file.  This has been deprecated in
 
720   anticipation of a separate Content-Disposition header field, to be
 
721   defined in a subsequent RFC.
 
723   The recommended action for an implementation that receives an
 
724   "application/octet-stream" entity is to simply offer to put the data
 
725   in a file, with any Content-Transfer-Encoding undone, or perhaps to
 
726   use it as input to a user-specified process.
 
730Freed & Borenstein          Standards Track                    [Page 13]
 
732RFC 2046                      Media Types                  November 1996
 
735   To reduce the danger of transmitting rogue programs, it is strongly
 
736   recommended that implementations NOT implement a path-search
 
737   mechanism whereby an arbitrary program named in the Content-Type
 
738   parameter (e.g., an "interpreter=" parameter) is found and executed
 
739   using the message body as input.
 
7414.5.2.  PostScript Subtype
 
743   A media type of "application/postscript" indicates a PostScript
 
744   program.  Currently two variants of the PostScript language are
 
745   allowed; the original level 1 variant is described in [POSTSCRIPT]
 
746   and the more recent level 2 variant is described in [POSTSCRIPT2].
 
748   PostScript is a registered trademark of Adobe Systems, Inc.  Use of
 
749   the MIME media type "application/postscript" implies recognition of
 
750   that trademark and all the rights it entails.
 
752   The PostScript language definition provides facilities for internal
 
753   labelling of the specific language features a given program uses.
 
754   This labelling, called the PostScript document structuring
 
755   conventions, or DSC, is very general and provides substantially more
 
756   information than just the language level.  The use of document
 
757   structuring conventions, while not required, is strongly recommended
 
758   as an aid to interoperability.  Documents which lack proper
 
759   structuring conventions cannot be tested to see whether or not they
 
760   will work in a given environment.  As such, some systems may assume
 
761   the worst and refuse to process unstructured documents.
 
763   The execution of general-purpose PostScript interpreters entails
 
764   serious security risks, and implementors are discouraged from simply
 
765   sending PostScript bodies to "off- the-shelf" interpreters.  While it
 
766   is usually safe to send PostScript to a printer, where the potential
 
767   for harm is greatly constrained by typical printer environments,
 
768   implementors should consider all of the following before they add
 
769   interactive display of PostScript bodies to their MIME readers.
 
771   The remainder of this section outlines some, though probably not all,
 
772   of the possible problems with the transport of PostScript entities.
 
774    (1)   Dangerous operations in the PostScript language
 
775          include, but may not be limited to, the PostScript
 
776          operators "deletefile", "renamefile", "filenameforall",
 
777          and "file".  "File" is only dangerous when applied to
 
778          something other than standard input or output.
 
779          Implementations may also define additional nonstandard
 
780          file operators; these may also pose a threat to
 
781          security. "Filenameforall", the wildcard file search
 
782          operator, may appear at first glance to be harmless.
 
786Freed & Borenstein          Standards Track                    [Page 14]
 
788RFC 2046                      Media Types                  November 1996
 
791          Note, however, that this operator has the potential to
 
792          reveal information about what files the recipient has
 
793          access to, and this information may itself be
 
794          sensitive.  Message senders should avoid the use of
 
795          potentially dangerous file operators, since these
 
796          operators are quite likely to be unavailable in secure
 
797          PostScript implementations.  Message receiving and
 
798          displaying software should either completely disable
 
799          all potentially dangerous file operators or take
 
800          special care not to delegate any special authority to
 
801          their operation.  These operators should be viewed as
 
802          being done by an outside agency when interpreting
 
803          PostScript documents.  Such disabling and/or checking
 
804          should be done completely outside of the reach of the
 
805          PostScript language itself; care should be taken to
 
806          insure that no method exists for re-enabling full-
 
807          function versions of these operators.
 
809    (2)   The PostScript language provides facilities for exiting
 
810          the normal interpreter, or server, loop.  Changes made
 
811          in this "outer" environment are customarily retained
 
812          across documents, and may in some cases be retained
 
813          semipermanently in nonvolatile memory.  The operators
 
814          associated with exiting the interpreter loop have the
 
815          potential to interfere with subsequent document
 
816          processing.  As such, their unrestrained use
 
817          constitutes a threat of service denial.  PostScript
 
818          operators that exit the interpreter loop include, but
 
819          may not be limited to, the exitserver and startjob
 
820          operators.  Message sending software should not
 
821          generate PostScript that depends on exiting the
 
822          interpreter loop to operate, since the ability to exit
 
823          will probably be unavailable in secure PostScript
 
824          implementations.  Message receiving and displaying
 
825          software should completely disable the ability to make
 
826          retained changes to the PostScript environment by
 
827          eliminating or disabling the "startjob" and
 
828          "exitserver" operations.  If these operations cannot be
 
829          eliminated or completely disabled the password
 
830          associated with them should at least be set to a hard-
 
833    (3)   PostScript provides operators for setting system-wide
 
834          and device-specific parameters.  These parameter
 
835          settings may be retained across jobs and may
 
836          potentially pose a threat to the correct operation of
 
837          the interpreter.  The PostScript operators that set
 
838          system and device parameters include, but may not be
 
842Freed & Borenstein          Standards Track                    [Page 15]
 
844RFC 2046                      Media Types                  November 1996
 
847          limited to, the "setsystemparams" and "setdevparams"
 
848          operators.  Message sending software should not
 
849          generate PostScript that depends on the setting of
 
850          system or device parameters to operate correctly.  The
 
851          ability to set these parameters will probably be
 
852          unavailable in secure PostScript implementations.
 
853          Message receiving and displaying software should
 
854          disable the ability to change system and device
 
855          parameters.  If these operators cannot be completely
 
856          disabled the password associated with them should at
 
857          least be set to a hard-to-guess value.
 
859    (4)   Some PostScript implementations provide nonstandard
 
860          facilities for the direct loading and execution of
 
861          machine code.  Such facilities are quite obviously open
 
862          to substantial abuse.  Message sending software should
 
863          not make use of such features.  Besides being totally
 
864          hardware-specific, they are also likely to be
 
865          unavailable in secure implementations of PostScript.
 
866          Message receiving and displaying software should not
 
867          allow such operators to be used if they exist.
 
869    (5)   PostScript is an extensible language, and many, if not
 
870          most, implementations of it provide a number of their
 
871          own extensions.  This document does not deal with such
 
872          extensions explicitly since they constitute an unknown
 
873          factor.  Message sending software should not make use
 
874          of nonstandard extensions; they are likely to be
 
875          missing from some implementations.  Message receiving
 
876          and displaying software should make sure that any
 
877          nonstandard PostScript operators are secure and don't
 
878          present any kind of threat.
 
880    (6)   It is possible to write PostScript that consumes huge
 
881          amounts of various system resources.  It is also
 
882          possible to write PostScript programs that loop
 
883          indefinitely.  Both types of programs have the
 
884          potential to cause damage if sent to unsuspecting
 
885          recipients.  Message-sending software should avoid the
 
886          construction and dissemination of such programs, which
 
887          is antisocial.  Message receiving and displaying
 
888          software should provide appropriate mechanisms to abort
 
889          processing after a reasonable amount of time has
 
890          elapsed. In addition, PostScript interpreters should be
 
891          limited to the consumption of only a reasonable amount
 
892          of any given system resource.
 
898Freed & Borenstein          Standards Track                    [Page 16]
 
900RFC 2046                      Media Types                  November 1996
 
903    (7)   It is possible to include raw binary information inside
 
904          PostScript in various forms.  This is not recommended
 
905          for use in Internet mail, both because it is not
 
906          supported by all PostScript interpreters and because it
 
907          significantly complicates the use of a MIME Content-
 
908          Transfer-Encoding.  (Without such binary, PostScript
 
909          may typically be viewed as line-oriented data.  The
 
910          treatment of CRLF sequences becomes extremely
 
911          problematic if binary and line-oriented data are mixed
 
912          in a single Postscript data stream.)
 
914    (8)   Finally, bugs may exist in some PostScript interpreters
 
915          which could possibly be exploited to gain unauthorized
 
916          access to a recipient's system.  Apart from noting this
 
917          possibility, there is no specific action to take to
 
918          prevent this, apart from the timely correction of such
 
919          bugs if any are found.
 
9214.5.3.  Other Application Subtypes
 
923   It is expected that many other subtypes of "application" will be
 
924   defined in the future.  MIME implementations must at a minimum treat
 
925   any unrecognized subtypes as being equivalent to "application/octet-
 
9285.  Composite Media Type Values
 
930   The remaining two of the seven initial Content-Type values refer to
 
931   composite entities.  Composite entities are handled using MIME
 
932   mechanisms -- a MIME processor typically handles the body directly.
 
9345.1.  Multipart Media Type
 
936   In the case of multipart entities, in which one or more different
 
937   sets of data are combined in a single body, a "multipart" media type
 
938   field must appear in the entity's header.  The body must then contain
 
939   one or more body parts, each preceded by a boundary delimiter line,
 
940   and the last one followed by a closing boundary delimiter line.
 
941   After its boundary delimiter line, each body part then consists of a
 
942   header area, a blank line, and a body area.  Thus a body part is
 
943   similar to an RFC 822 message in syntax, but different in meaning.
 
945   A body part is an entity and hence is NOT to be interpreted as
 
946   actually being an RFC 822 message.  To begin with, NO header fields
 
947   are actually required in body parts.  A body part that starts with a
 
948   blank line, therefore, is allowed and is a body part for which all
 
949   default values are to be assumed.  In such a case, the absence of a
 
950   Content-Type header usually indicates that the corresponding body has
 
954Freed & Borenstein          Standards Track                    [Page 17]
 
956RFC 2046                      Media Types                  November 1996
 
959   a content-type of "text/plain; charset=US-ASCII".
 
961   The only header fields that have defined meaning for body parts are
 
962   those the names of which begin with "Content-".  All other header
 
963   fields may be ignored in body parts.  Although they should generally
 
964   be retained if at all possible, they may be discarded by gateways if
 
965   necessary.  Such other fields are permitted to appear in body parts
 
966   but must not be depended on.  "X-" fields may be created for
 
967   experimental or private purposes, with the recognition that the
 
968   information they contain may be lost at some gateways.
 
970   NOTE:  The distinction between an RFC 822 message and a body part is
 
971   subtle, but important.  A gateway between Internet and X.400 mail,
 
972   for example, must be able to tell the difference between a body part
 
973   that contains an image and a body part that contains an encapsulated
 
974   message, the body of which is a JPEG image.  In order to represent
 
975   the latter, the body part must have "Content-Type: message/rfc822",
 
976   and its body (after the blank line) must be the encapsulated message,
 
977   with its own "Content-Type: image/jpeg" header field.  The use of
 
978   similar syntax facilitates the conversion of messages to body parts,
 
979   and vice versa, but the distinction between the two must be
 
980   understood by implementors.  (For the special case in which parts
 
981   actually are messages, a "digest" subtype is also defined.)
 
983   As stated previously, each body part is preceded by a boundary
 
984   delimiter line that contains the boundary delimiter.  The boundary
 
985   delimiter MUST NOT appear inside any of the encapsulated parts, on a
 
986   line by itself or as the prefix of any line.  This implies that it is
 
987   crucial that the composing agent be able to choose and specify a
 
988   unique boundary parameter value that does not contain the boundary
 
989   parameter value of an enclosing multipart as a prefix.
 
991   All present and future subtypes of the "multipart" type must use an
 
992   identical syntax.  Subtypes may differ in their semantics, and may
 
993   impose additional restrictions on syntax, but must conform to the
 
994   required syntax for the "multipart" type.  This requirement ensures
 
995   that all conformant user agents will at least be able to recognize
 
996   and separate the parts of any multipart entity, even those of an
 
997   unrecognized subtype.
 
999   As stated in the definition of the Content-Transfer-Encoding field
 
1000   [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is
 
1001   permitted for entities of type "multipart".  The "multipart" boundary
 
1002   delimiters and header fields are always represented as 7bit US-ASCII
 
1003   in any case (though the header fields may encode non-US-ASCII header
 
1004   text as per RFC 2047) and data within the body parts can be encoded
 
1005   on a part-by-part basis, with Content-Transfer-Encoding fields for
 
1006   each appropriate body part.
 
1010Freed & Borenstein          Standards Track                    [Page 18]
 
1012RFC 2046                      Media Types                  November 1996
 
1017   This section defines a common syntax for subtypes of "multipart".
 
1018   All subtypes of "multipart" must use this syntax.  A simple example
 
1019   of a multipart message also appears in this section.  An example of a
 
1020   more complex multipart message is given in RFC 2049.
 
1022   The Content-Type field for multipart entities requires one parameter,
 
1023   "boundary". The boundary delimiter line is then defined as a line
 
1024   consisting entirely of two hyphen characters ("-", decimal value 45)
 
1025   followed by the boundary parameter value from the Content-Type header
 
1026   field, optional linear whitespace, and a terminating CRLF.
 
1028   NOTE:  The hyphens are for rough compatibility with the earlier RFC
 
1029   934 method of message encapsulation, and for ease of searching for
 
1030   the boundaries in some implementations.  However, it should be noted
 
1031   that multipart messages are NOT completely compatible with RFC 934
 
1032   encapsulations; in particular, they do not obey RFC 934 quoting
 
1033   conventions for embedded lines that begin with hyphens.  This
 
1034   mechanism was chosen over the RFC 934 mechanism because the latter
 
1035   causes lines to grow with each level of quoting.  The combination of
 
1036   this growth with the fact that SMTP implementations sometimes wrap
 
1037   long lines made the RFC 934 mechanism unsuitable for use in the event
 
1038   that deeply-nested multipart structuring is ever desired.
 
1040   WARNING TO IMPLEMENTORS:  The grammar for parameters on the Content-
 
1041   type field is such that it is often necessary to enclose the boundary
 
1042   parameter values in quotes on the Content-type line.  This is not
 
1043   always necessary, but never hurts. Implementors should be sure to
 
1044   study the grammar carefully in order to avoid producing invalid
 
1045   Content-type fields.  Thus, a typical "multipart" Content-Type header
 
1046   field might look like this:
 
1048     Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p
 
1050   But the following is not valid:
 
1052     Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p
 
1054   (because of the colon) and must instead be represented as
 
1056     Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p"
 
1058   This Content-Type value indicates that the content consists of one or
 
1059   more parts, each with a structure that is syntactically identical to
 
1060   an RFC 822 message, except that the header area is allowed to be
 
1061   completely empty, and that the parts are each preceded by the line
 
1066Freed & Borenstein          Standards Track                    [Page 19]
 
1068RFC 2046                      Media Types                  November 1996
 
1071     --gc0pJq0M:08jU534c0p
 
1073   The boundary delimiter MUST occur at the beginning of a line, i.e.,
 
1074   following a CRLF, and the initial CRLF is considered to be attached
 
1075   to the boundary delimiter line rather than part of the preceding
 
1076   part.  The boundary may be followed by zero or more characters of
 
1077   linear whitespace. It is then terminated by either another CRLF and
 
1078   the header fields for the next part, or by two CRLFs, in which case
 
1079   there are no header fields for the next part.  If no Content-Type
 
1080   field is present it is assumed to be "message/rfc822" in a
 
1081   "multipart/digest" and "text/plain" otherwise.
 
1083   NOTE:  The CRLF preceding the boundary delimiter line is conceptually
 
1084   attached to the boundary so that it is possible to have a part that
 
1085   does not end with a CRLF (line  break).  Body parts that must be
 
1086   considered to end with line breaks, therefore, must have two CRLFs
 
1087   preceding the boundary delimiter line, the first of which is part of
 
1088   the preceding body part, and the second of which is part of the
 
1089   encapsulation boundary.
 
1091   Boundary delimiters must not appear within the encapsulated material,
 
1092   and must be no longer than 70 characters, not counting the two
 
1095   The boundary delimiter line following the last body part is a
 
1096   distinguished delimiter that indicates that no further body parts
 
1097   will follow.  Such a delimiter line is identical to the previous
 
1098   delimiter lines, with the addition of two more hyphens after the
 
1099   boundary parameter value.
 
1101     --gc0pJq0M:08jU534c0p--
 
1104   boundary value with the beginning of each candidate line.  An exact
 
1105   match of the entire candidate line is not required; it is sufficient
 
1106   that the boundary appear in its entirety following the CRLF.
 
1108   There appears to be room for additional information prior to the
 
1109   first boundary delimiter line and following the final boundary
 
1110   delimiter line.  These areas should generally be left blank, and
 
1111   implementations must ignore anything that appears before the first
 
1112   boundary delimiter line or after the last one.
 
1114   NOTE:  These "preamble" and "epilogue" areas are generally not used
 
1115   because of the lack of proper typing of these parts and the lack of
 
1116   clear semantics for handling these areas at gateways, particularly
 
1117   X.400 gateways.  However, rather than leaving the preamble area
 
1118   blank, many MIME implementations have found this to be a convenient
 
1122Freed & Borenstein          Standards Track                    [Page 20]
 
1124RFC 2046                      Media Types                  November 1996
 
1127   place to insert an explanatory note for recipients who read the
 
1128   message with pre-MIME software, since such notes will be ignored by
 
1129   MIME-compliant software.
 
1131   NOTE:  Because boundary delimiters must not appear in the body parts
 
1132   being encapsulated, a user agent must exercise care to choose a
 
1133   unique boundary parameter value.  The boundary parameter value in the
 
1134   example above could have been the result of an algorithm designed to
 
1135   produce boundary delimiters with a very low probability of already
 
1136   existing in the data to be encapsulated without having to prescan the
 
1137   data.  Alternate algorithms might result in more "readable" boundary
 
1138   delimiters for a recipient with an old user agent, but would require
 
1139   more attention to the possibility that the boundary delimiter might
 
1140   appear at the beginning of some line in the encapsulated part.  The
 
1141   simplest boundary delimiter line possible is something like "---",
 
1142   with a closing boundary delimiter line of "-----".
 
1144   As a very simple example, the following multipart message has two
 
1145   parts, both of them plain text, one of them explicitly typed and one
 
1146   of them implicitly typed:
 
1149     To: Ned Freed <ned@innosoft.com>
 
1150     Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST)
 
1151     Subject: Sample message
 
1153     Content-type: multipart/mixed; boundary="simple boundary"
 
1155     This is the preamble.  It is to be ignored, though it
 
1156     is a handy place for composition agents to include an
 
1157     explanatory note to non-MIME conformant readers.
 
1161     This is implicitly typed plain US-ASCII text.
 
1162     It does NOT end with a linebreak.
 
1164     Content-type: text/plain; charset=us-ascii
 
1166     This is explicitly typed plain US-ASCII text.
 
1167     It DOES end with a linebreak.
 
1171     This is the epilogue.  It is also to be ignored.
 
1178Freed & Borenstein          Standards Track                    [Page 21]
 
1180RFC 2046                      Media Types                  November 1996
 
1183   The use of a media type of "multipart" in a body part within another
 
1184   "multipart" entity is explicitly allowed.  In such cases, for obvious
 
1185   reasons, care must be taken to ensure that each nested "multipart"
 
1186   entity uses a different boundary delimiter.  See RFC 2049 for an
 
1187   example of nested "multipart" entities.
 
1189   The use of the "multipart" media type with only a single body part
 
1190   may be useful in certain contexts, and is explicitly permitted.
 
1192   NOTE: Experience has shown that a "multipart" media type with a
 
1193   single body part is useful for sending non-text media types.  It has
 
1194   the advantage of providing the preamble as a place to include
 
1195   decoding instructions.  In addition, a number of SMTP gateways move
 
1196   or remove the MIME headers, and a clever MIME decoder can take a good
 
1197   guess at multipart boundaries even in the absence of the Content-Type
 
1198   header and thereby successfully decode the message.
 
1200   The only mandatory global parameter for the "multipart" media type is
 
1201   the boundary parameter, which consists of 1 to 70 characters from a
 
1202   set of characters known to be very robust through mail gateways, and
 
1203   NOT ending with white space. (If a boundary delimiter line appears to
 
1204   end with white space, the white space must be presumed to have been
 
1205   added by a gateway, and must be deleted.)  It is formally specified
 
1206   by the following BNF:
 
1208     boundary := 0*69<bchars> bcharsnospace
 
1210     bchars := bcharsnospace / " "
 
1212     bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
 
1213                      "+" / "_" / "," / "-" / "." /
 
1214                      "/" / ":" / "=" / "?"
 
1216   Overall, the body of a "multipart" entity may be specified as
 
1219     dash-boundary := "--" boundary
 
1220                      ; boundary taken from the value of
 
1221                      ; boundary parameter of the
 
1222                      ; Content-Type field.
 
1224     multipart-body := [preamble CRLF]
 
1225                       dash-boundary transport-padding CRLF
 
1226                       body-part *encapsulation
 
1227                       close-delimiter transport-padding
 
1234Freed & Borenstein          Standards Track                    [Page 22]
 
1236RFC 2046                      Media Types                  November 1996
 
1239     transport-padding := *LWSP-char
 
1240                          ; Composers MUST NOT generate
 
1241                          ; non-zero length transport
 
1242                          ; padding, but receivers MUST
 
1243                          ; be able to handle padding
 
1244                          ; added by message transports.
 
1246     encapsulation := delimiter transport-padding
 
1249     delimiter := CRLF dash-boundary
 
1251     close-delimiter := delimiter "--"
 
1253     preamble := discard-text
 
1255     epilogue := discard-text
 
1257     discard-text := *(*text CRLF) *text
 
1258                     ; May be ignored or discarded.
 
1260     body-part := MIME-part-headers [CRLF *OCTET]
 
1261                  ; Lines in a body-part must not start
 
1262                  ; with the specified dash-boundary and
 
1263                  ; the delimiter must not appear anywhere
 
1264                  ; in the body part.  Note that the
 
1265                  ; semantics of a body-part differ from
 
1266                  ; the semantics of a message, as
 
1267                  ; described in the text.
 
1269     OCTET := <any 0-255 octet value>
 
1271   IMPORTANT:  The free insertion of linear-white-space and RFC 822
 
1272   comments between the elements shown in this BNF is NOT allowed since
 
1273   this BNF does not specify a structured header field.
 
1275   NOTE:  In certain transport enclaves, RFC 822 restrictions such as
 
1276   the one that limits bodies to printable US-ASCII characters may not
 
1277   be in force. (That is, the transport domains may exist that resemble
 
1278   standard Internet mail transport as specified in RFC 821 and assumed
 
1279   by RFC 822, but without certain restrictions.) The relaxation of
 
1280   these restrictions should be construed as locally extending the
 
1281   definition of bodies, for example to include octets outside of the
 
1282   US-ASCII range, as long as these extensions are supported by the
 
1283   transport and adequately documented in the Content- Transfer-Encoding
 
1284   header field.  However, in no event are headers (either message
 
1285   headers or body part headers) allowed to contain anything other than
 
1286   US-ASCII characters.
 
1290Freed & Borenstein          Standards Track                    [Page 23]
 
1292RFC 2046                      Media Types                  November 1996
 
1295   NOTE:  Conspicuously missing from the "multipart" type is a notion of
 
1296   structured, related body parts. It is recommended that those wishing
 
1297   to provide more structured or integrated multipart messaging
 
1298   facilities should define subtypes of multipart that are syntactically
 
1299   identical but define relationships between the various parts. For
 
1300   example, subtypes of multipart could be defined that include a
 
1301   distinguished part which in turn is used to specify the relationships
 
1302   between the other parts, probably referring to them by their
 
1303   Content-ID field.  Old implementations will not recognize the new
 
1304   subtype if this approach is used, but will treat it as
 
1305   multipart/mixed and will thus be able to show the user the parts that
 
13085.1.2.  Handling Nested Messages and Multiparts
 
1310   The "message/rfc822" subtype defined in a subsequent section of this
 
1311   document has no terminating condition other than running out of data.
 
1312   Similarly, an improperly truncated "multipart" entity may not have
 
1313   any terminating boundary marker, and can turn up operationally due to
 
1314   mail system malfunctions.
 
1316   It is essential that such entities be handled correctly when they are
 
1317   themselves imbedded inside of another "multipart" structure.  MIME
 
1318   implementations are therefore required to recognize outer level
 
1319   boundary markers at ANY level of inner nesting.  It is not sufficient
 
1320   to only check for the next expected marker or other terminating
 
1325   The "mixed" subtype of "multipart" is intended for use when the body
 
1326   parts are independent and need to be bundled in a particular order.
 
1327   Any "multipart" subtypes that an implementation does not recognize
 
1328   must be treated as being of subtype "mixed".
 
13305.1.4.  Alternative Subtype
 
1332   The "multipart/alternative" type is syntactically identical to
 
1333   "multipart/mixed", but the semantics are different.  In particular,
 
1334   each of the body parts is an "alternative" version of the same
 
1337   Systems should recognize that the content of the various parts are
 
1338   interchangeable.  Systems should choose the "best" type based on the
 
1339   local environment and references, in some cases even through user
 
1340   interaction.  As with "multipart/mixed", the order of body parts is
 
1341   significant.  In this case, the alternatives appear in an order of
 
1342   increasing faithfulness to the original content.  In general, the
 
1346Freed & Borenstein          Standards Track                    [Page 24]
 
1348RFC 2046                      Media Types                  November 1996
 
1351   best choice is the LAST part of a type supported by the recipient
 
1352   system's local environment.
 
1354   "Multipart/alternative" may be used, for example, to send a message
 
1355   in a fancy text format in such a way that it can easily be displayed
 
1358     From: Nathaniel Borenstein <nsb@bellcore.com>
 
1359     To: Ned Freed <ned@innosoft.com>
 
1360     Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)
 
1361     Subject: Formatted text mail
 
1363     Content-Type: multipart/alternative; boundary=boundary42
 
1366     Content-Type: text/plain; charset=us-ascii
 
1368       ... plain text version of message goes here ...
 
1371     Content-Type: text/enriched
 
1373       ... RFC 1896 text/enriched version of same message
 
1377     Content-Type: application/x-whatever
 
1379       ... fanciest version of same message goes here ...
 
1383   In this example, users whose mail systems understood the
 
1384   "application/x-whatever" format would see only the fancy version,
 
1385   while other users would see only the enriched or plain text version,
 
1386   depending on the capabilities of their system.
 
1388   In general, user agents that compose "multipart/alternative" entities
 
1389   must place the body parts in increasing order of preference, that is,
 
1390   with the preferred format last.  For fancy text, the sending user
 
1391   agent should put the plainest format first and the richest format
 
1392   last.  Receiving user agents should pick and display the last format
 
1393   they are capable of displaying.  In the case where one of the
 
1394   alternatives is itself of type "multipart" and contains unrecognized
 
1395   sub-parts, the user agent may choose either to show that alternative,
 
1396   an earlier alternative, or both.
 
1402Freed & Borenstein          Standards Track                    [Page 25]
 
1404RFC 2046                      Media Types                  November 1996
 
1407   NOTE: From an implementor's perspective, it might seem more sensible
 
1408   to reverse this ordering, and have the plainest alternative last.
 
1409   However, placing the plainest alternative first is the friendliest
 
1410   possible option when "multipart/alternative" entities are viewed
 
1411   using a non-MIME-conformant viewer.  While this approach does impose
 
1412   some burden on conformant MIME viewers, interoperability with older
 
1413   mail readers was deemed to be more important in this case.
 
1415   It may be the case that some user agents, if they can recognize more
 
1416   than one of the formats, will prefer to offer the user the choice of
 
1417   which format to view.  This makes sense, for example, if a message
 
1418   includes both a nicely- formatted image version and an easily-edited
 
1419   text version.  What is most critical, however, is that the user not
 
1420   automatically be shown multiple versions of the same data.  Either
 
1421   the user should be shown the last recognized version or should be
 
1424   THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE:  Each part of a
 
1425   "multipart/alternative" entity represents the same data, but the
 
1426   mappings between the two are not necessarily without information
 
1427   loss.  For example, information is lost when translating ODA to
 
1428   PostScript or plain text.  It is recommended that each part should
 
1429   have a different Content-ID value in the case where the information
 
1430   content of the two parts is not identical.  And when the information
 
1431   content is identical -- for example, where several parts of type
 
1432   "message/external-body" specify alternate ways to access the
 
1433   identical data -- the same Content-ID field value should be used, to
 
1434   optimize any caching mechanisms that might be present on the
 
1435   recipient's end.  However, the Content-ID values used by the parts
 
1436   should NOT be the same Content-ID value that describes the
 
1437   "multipart/alternative" as a whole, if there is any such Content-ID
 
1438   field.  That is, one Content-ID value will refer to the
 
1439   "multipart/alternative" entity, while one or more other Content-ID
 
1440   values will refer to the parts inside it.
 
14425.1.5.  Digest Subtype
 
1444   This document defines a "digest" subtype of the "multipart" Content-
 
1445   Type.  This type is syntactically identical to "multipart/mixed", but
 
1446   the semantics are different.  In particular, in a digest, the default
 
1447   Content-Type value for a body part is changed from "text/plain" to
 
1448   "message/rfc822".  This is done to allow a more readable digest
 
1449   format that is largely compatible (except for the quoting convention)
 
1452   Note: Though it is possible to specify a Content-Type value for a
 
1453   body part in a digest which is other than "message/rfc822", such as a
 
1454   "text/plain" part containing a description of the material in the
 
1458Freed & Borenstein          Standards Track                    [Page 26]
 
1460RFC 2046                      Media Types                  November 1996
 
1463   digest, actually doing so is undesireble. The "multipart/digest"
 
1464   Content-Type is intended to be used to send collections of messages.
 
1465   If a "text/plain" part is needed, it should be included as a seperate
 
1466   part of a "multipart/mixed" message.
 
1468   A digest in this format might, then, look something like this:
 
1470     From: Moderator-Address
 
1472     Date: Mon, 22 Mar 1994 13:34:51 +0000
 
1473     Subject: Internet Digest, volume 42
 
1475     Content-Type: multipart/mixed;
 
1476                   boundary="---- main boundary ----"
 
1478     ------ main boundary ----
 
1480       ...Introductory text or table of contents...
 
1482     ------ main boundary ----
 
1483     Content-Type: multipart/digest;
 
1484                   boundary="---- next message ----"
 
1486     ------ next message ----
 
1489     Date: Fri, 26 Mar 1993 11:13:32 +0200
 
1492       ...body goes here ...
 
1494     ------ next message ----
 
1496     From: someone-else-again
 
1497     Date: Fri, 26 Mar 1993 10:07:13 -0500
 
1498     Subject: my different opinion
 
1500       ... another body goes here ...
 
1502     ------ next message ------
 
1504     ------ main boundary ------
 
15065.1.6.  Parallel Subtype
 
1508   This document defines a "parallel" subtype of the "multipart"
 
1509   Content-Type.  This type is syntactically identical to
 
1510   "multipart/mixed", but the semantics are different.  In particular,
 
1514Freed & Borenstein          Standards Track                    [Page 27]
 
1516RFC 2046                      Media Types                  November 1996
 
1519   in a parallel entity, the order of body parts is not significant.
 
1521   A common presentation of this type is to display all of the parts
 
1522   simultaneously on hardware and software that are capable of doing so.
 
1523   However, composing agents should be aware that many mail readers will
 
1524   lack this capability and will show the parts serially in any event.
 
15265.1.7.  Other Multipart Subtypes
 
1528   Other "multipart" subtypes are expected in the future.  MIME
 
1529   implementations must in general treat unrecognized subtypes of
 
1530   "multipart" as being equivalent to "multipart/mixed".
 
15325.2.  Message Media Type
 
1534   It is frequently desirable, in sending mail, to encapsulate another
 
1535   mail message.  A special media type, "message", is defined to
 
1536   facilitate this.  In particular, the "rfc822" subtype of "message" is
 
1537   used to encapsulate RFC 822 messages.
 
1539   NOTE:  It has been suggested that subtypes of "message" might be
 
1540   defined for forwarded or rejected messages.  However, forwarded and
 
1541   rejected messages can be handled as multipart messages in which the
 
1542   first part contains any control or descriptive information, and a
 
1543   second part, of type "message/rfc822", is the forwarded or rejected
 
1544   message.  Composing rejection and forwarding messages in this manner
 
1545   will preserve the type information on the original message and allow
 
1546   it to be correctly presented to the recipient, and hence is strongly
 
1549   Subtypes of "message" often impose restrictions on what encodings are
 
1550   allowed.  These restrictions are described in conjunction with each
 
1553   Mail gateways, relays, and other mail handling agents are commonly
 
1554   known to alter the top-level header of an RFC 822 message.  In
 
1555   particular, they frequently add, remove, or reorder header fields.
 
1556   These operations are explicitly forbidden for the encapsulated
 
1557   headers embedded in the bodies of messages of type "message."
 
15595.2.1.  RFC822 Subtype
 
1561   A media type of "message/rfc822" indicates that the body contains an
 
1562   encapsulated message, with the syntax of an RFC 822 message.
 
1563   However, unlike top-level RFC 822 messages, the restriction that each
 
1564   "message/rfc822" body must include a "From", "Date", and at least one
 
1565   destination header is removed and replaced with the requirement that
 
1566   at least one of "From", "Subject", or "Date" must be present.
 
1570Freed & Borenstein          Standards Track                    [Page 28]
 
1572RFC 2046                      Media Types                  November 1996
 
1575   It should be noted that, despite the use of the numbers "822", a
 
1576   "message/rfc822" entity isn't restricted to material in strict
 
1577   conformance to RFC822, nor are the semantics of "message/rfc822"
 
1578   objects restricted to the semantics defined in RFC822. More
 
1579   specifically, a "message/rfc822" message could well be a News article
 
1582   No encoding other than "7bit", "8bit", or "binary" is permitted for
 
1583   the body of a "message/rfc822" entity.  The message header fields are
 
1584   always US-ASCII in any case, and data within the body can still be
 
1585   encoded, in which case the Content-Transfer-Encoding header field in
 
1586   the encapsulated message will reflect this.  Non-US-ASCII text in the
 
1587   headers of an encapsulated message can be specified using the
 
1588   mechanisms described in RFC 2047.
 
15905.2.2.  Partial Subtype
 
1592   The "partial" subtype is defined to allow large entities to be
 
1593   delivered as several separate pieces of mail and automatically
 
1594   reassembled by a receiving user agent.  (The concept is similar to IP
 
1595   fragmentation and reassembly in the basic Internet Protocols.)  This
 
1596   mechanism can be used when intermediate transport agents limit the
 
1597   size of individual messages that can be sent.  The media type
 
1598   "message/partial" thus indicates that the body contains a fragment of
 
1601   Because data of type "message" may never be encoded in base64 or
 
1602   quoted-printable, a problem might arise if "message/partial" entities
 
1603   are constructed in an environment that supports binary or 8bit
 
1604   transport.  The problem is that the binary data would be split into
 
1605   multiple "message/partial" messages, each of them requiring binary
 
1606   transport.  If such messages were encountered at a gateway into a
 
1607   7bit transport environment, there would be no way to properly encode
 
1608   them for the 7bit world, aside from waiting for all of the fragments,
 
1609   reassembling the inner message, and then encoding the reassembled
 
1610   data in base64 or quoted-printable.  Since it is possible that
 
1611   different fragments might go through different gateways, even this is
 
1612   not an acceptable solution.  For this reason, it is specified that
 
1613   entities of type "message/partial" must always have a content-
 
1614   transfer-encoding of 7bit (the default).  In particular, even in
 
1615   environments that support binary or 8bit transport, the use of a
 
1616   content- transfer-encoding of "8bit" or "binary" is explicitly
 
1617   prohibited for MIME entities of type "message/partial". This in turn
 
1618   implies that the inner message must not use "8bit" or "binary"
 
1626Freed & Borenstein          Standards Track                    [Page 29]
 
1628RFC 2046                      Media Types                  November 1996
 
1631   Because some message transfer agents may choose to automatically
 
1632   fragment large messages, and because such agents may use very
 
1633   different fragmentation thresholds, it is possible that the pieces of
 
1634   a partial message, upon reassembly, may prove themselves to comprise
 
1635   a partial message.  This is explicitly permitted.
 
1637   Three parameters must be specified in the Content-Type field of type
 
1638   "message/partial":  The first, "id", is a unique identifier, as close
 
1639   to a world-unique identifier as possible, to be used to match the
 
1640   fragments together. (In general, the identifier is essentially a
 
1641   message-id; if placed in double quotes, it can be ANY message-id, in
 
1642   accordance with the BNF for "parameter" given in RFC 2045.)  The
 
1643   second, "number", an integer, is the fragment number, which indicates
 
1644   where this fragment fits into the sequence of fragments.  The third,
 
1645   "total", another integer, is the total number of fragments.  This
 
1646   third subfield is required on the final fragment, and is optional
 
1647   (though encouraged) on the earlier fragments.  Note also that these
 
1648   parameters may be given in any order.
 
1650   Thus, the second piece of a 3-piece message may have either of the
 
1651   following header fields:
 
1653     Content-Type: Message/Partial; number=2; total=3;
 
1654                   id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
 
1656     Content-Type: Message/Partial;
 
1657                   id="oc=jpbe0M2Yt4s@thumper.bellcore.com";
 
1660   But the third piece MUST specify the total number of fragments:
 
1662     Content-Type: Message/Partial; number=3; total=3;
 
1663                   id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
 
1665   Note that fragment numbering begins with 1, not 0.
 
1667   When the fragments of an entity broken up in this manner are put
 
1668   together, the result is always a complete MIME entity, which may have
 
1669   its own Content-Type header field, and thus may contain any other
 
16725.2.2.1.  Message Fragmentation and Reassembly
 
1674   The semantics of a reassembled partial message must be those of the
 
1675   "inner" message, rather than of a message containing the inner
 
1676   message.  This makes it possible, for example, to send a large audio
 
1677   message as several partial messages, and still have it appear to the
 
1678   recipient as a simple audio message rather than as an encapsulated
 
1682Freed & Borenstein          Standards Track                    [Page 30]
 
1684RFC 2046                      Media Types                  November 1996
 
1687   message containing an audio message.  That is, the encapsulation of
 
1688   the message is considered to be "transparent".
 
1690   When generating and reassembling the pieces of a "message/partial"
 
1691   message, the headers of the encapsulated message must be merged with
 
1692   the headers of the enclosing entities.  In this process the following
 
1693   rules must be observed:
 
1695    (1)   Fragmentation agents must split messages at line
 
1696          boundaries only. This restriction is imposed because
 
1697          splits at points other than the ends of lines in turn
 
1698          depends on message transports being able to preserve
 
1699          the semantics of messages that don't end with a CRLF
 
1700          sequence. Many transports are incapable of preserving
 
1703    (2)   All of the header fields from the initial enclosing
 
1704          message, except those that start with "Content-" and
 
1705          the specific header fields "Subject", "Message-ID",
 
1706          "Encrypted", and "MIME-Version", must be copied, in
 
1707          order, to the new message.
 
1709    (3)   The header fields in the enclosed message which start
 
1710          with "Content-", plus the "Subject", "Message-ID",
 
1711          "Encrypted", and "MIME-Version" fields, must be
 
1712          appended, in order, to the header fields of the new
 
1713          message.  Any header fields in the enclosed message
 
1714          which do not start with "Content-" (except for the
 
1715          "Subject", "Message-ID", "Encrypted", and "MIME-
 
1716          Version" fields) will be ignored and dropped.
 
1718    (4)   All of the header fields from the second and any
 
1719          subsequent enclosing messages are discarded by the
 
17225.2.2.2.  Fragmentation and Reassembly Example
 
1724   If an audio message is broken into two pieces, the first piece might
 
1725   look something like this:
 
1727     X-Weird-Header-1: Foo
 
1729     To: joe@otherhost.com
 
1730     Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
 
1731     Subject: Audio mail (part 1 of 2)
 
1732     Message-ID: <id1@host.com>
 
1734     Content-type: message/partial; id="ABC@host.com";
 
1738Freed & Borenstein          Standards Track                    [Page 31]
 
1740RFC 2046                      Media Types                  November 1996
 
1745     X-Weird-Header-1: Bar
 
1746     X-Weird-Header-2: Hello
 
1747     Message-ID: <anotherid@foo.com>
 
1750     Content-type: audio/basic
 
1751     Content-transfer-encoding: base64
 
1753       ... first half of encoded audio data goes here ...
 
1755   and the second half might look something like this:
 
1758     To: joe@otherhost.com
 
1759     Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
 
1760     Subject: Audio mail (part 2 of 2)
 
1762     Message-ID: <id2@host.com>
 
1763     Content-type: message/partial;
 
1764                   id="ABC@host.com"; number=2; total=2
 
1766       ... second half of encoded audio data goes here ...
 
1768   Then, when the fragmented message is reassembled, the resulting
 
1769   message to be displayed to the user should look something like this:
 
1771     X-Weird-Header-1: Foo
 
1773     To: joe@otherhost.com
 
1774     Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
 
1776     Message-ID: <anotherid@foo.com>
 
1778     Content-type: audio/basic
 
1779     Content-transfer-encoding: base64
 
1781       ... first half of encoded audio data goes here ...
 
1782       ... second half of encoded audio data goes here ...
 
1784   The inclusion of a "References" field in the headers of the second
 
1785   and subsequent pieces of a fragmented message that references the
 
1786   Message-Id on the previous piece may be of benefit to mail readers
 
1787   that understand and track references.  However, the generation of
 
1788   such "References" fields is entirely optional.
 
1794Freed & Borenstein          Standards Track                    [Page 32]
 
1796RFC 2046                      Media Types                  November 1996
 
1799   Finally, it should be noted that the "Encrypted" header field has
 
1800   been made obsolete by Privacy Enhanced Messaging (PEM) [RFC-1421,
 
1801   RFC-1422, RFC-1423, RFC-1424], but the rules above are nevertheless
 
1802   believed to describe the correct way to treat it if it is encountered
 
1803   in the context of conversion to and from "message/partial" fragments.
 
18055.2.3.  External-Body Subtype
 
1807   The external-body subtype indicates that the actual body data are not
 
1808   included, but merely referenced.  In this case, the parameters
 
1809   describe a mechanism for accessing the external data.
 
1811   When a MIME entity is of type "message/external-body", it consists of
 
1812   a header, two consecutive CRLFs, and the message header for the
 
1813   encapsulated message.  If another pair of consecutive CRLFs appears,
 
1814   this of course ends the message header for the encapsulated message.
 
1815   However, since the encapsulated message's body is itself external, it
 
1816   does NOT appear in the area that follows.  For example, consider the
 
1819     Content-type: message/external-body;
 
1820                   access-type=local-file;
 
1821                   name="/u/nsb/Me.jpeg"
 
1823     Content-type: image/jpeg
 
1824     Content-ID: <id42@guppylake.bellcore.com>
 
1825     Content-Transfer-Encoding: binary
 
1827     THIS IS NOT REALLY THE BODY!
 
1829   The area at the end, which might be called the "phantom body", is
 
1830   ignored for most external-body messages.  However, it may be used to
 
1831   contain auxiliary information for some such messages, as indeed it is
 
1832   when the access-type is "mail- server".  The only access-type defined
 
1833   in this document that uses the phantom body is "mail-server", but
 
1834   other access-types may be defined in the future in other
 
1835   specifications that use this area.
 
1837   The encapsulated headers in ALL "message/external-body" entities MUST
 
1838   include a Content-ID header field to give a unique identifier by
 
1839   which to reference the data.  This identifier may be used for caching
 
1840   mechanisms, and for recognizing the receipt of the data when the
 
1841   access-type is "mail-server".
 
1843   Note that, as specified here, the tokens that describe external-body
 
1844   data, such as file names and mail server commands, are required to be
 
1845   in the US-ASCII character set.
 
1850Freed & Borenstein          Standards Track                    [Page 33]
 
1852RFC 2046                      Media Types                  November 1996
 
1855   If this proves problematic in practice, a new mechanism may be
 
1856   required as a future extension to MIME, either as newly defined
 
1857   access-types for "message/external-body" or by some other mechanism.
 
1859   As with "message/partial", MIME entities of type "message/external-
 
1860   body" MUST have a content-transfer-encoding of 7bit (the default).
 
1861   In particular, even in environments that support binary or 8bit
 
1862   transport, the use of a content- transfer-encoding of "8bit" or
 
1863   "binary" is explicitly prohibited for entities of type
 
1864   "message/external-body".
 
18665.2.3.1.  General External-Body Parameters
 
1868   The parameters that may be used with any "message/external- body"
 
1871    (1)   ACCESS-TYPE -- A word indicating the supported access
 
1872          mechanism by which the file or data may be obtained.
 
1873          This word is not case sensitive.  Values include, but
 
1874          are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL-
 
1875          FILE", and "MAIL-SERVER".  Future values, except for
 
1876          experimental values beginning with "X-", must be
 
1877          registered with IANA, as described in RFC 2048.
 
1878          This parameter is unconditionally mandatory and MUST be
 
1879          present on EVERY "message/external-body".
 
1881    (2)   EXPIRATION -- The date (in the RFC 822 "date-time"
 
1882          syntax, as extended by RFC 1123 to permit 4 digits in
 
1883          the year field) after which the existence of the
 
1884          external data is not guaranteed.  This parameter may be
 
1885          used with ANY access-type and is ALWAYS optional.
 
1887    (3)   SIZE -- The size (in octets) of the data.  The intent
 
1888          of this parameter is to help the recipient decide
 
1889          whether or not to expend the necessary resources to
 
1890          retrieve the external data.  Note that this describes
 
1891          the size of the data in its canonical form, that is,
 
1892          before any Content-Transfer-Encoding has been applied
 
1893          or after the data have been decoded.  This parameter
 
1894          may be used with ANY access-type and is ALWAYS
 
1897    (4)   PERMISSION -- A case-insensitive field that indicates
 
1898          whether or not it is expected that clients might also
 
1899          attempt to overwrite the data.  By default, or if
 
1900          permission is "read", the assumption is that they are
 
1901          not, and that if the data is retrieved once, it is
 
1902          never needed again.  If PERMISSION is "read-write",
 
1906Freed & Borenstein          Standards Track                    [Page 34]
 
1908RFC 2046                      Media Types                  November 1996
 
1911          this assumption is invalid, and any local copy must be
 
1912          considered no more than a cache.  "Read" and "Read-
 
1913          write" are the only defined values of permission.  This
 
1914          parameter may be used with ANY access-type and is
 
1917   The precise semantics of the access-types defined here are described
 
1918   in the sections that follow.
 
19205.2.3.2.  The 'ftp' and 'tftp' Access-Types
 
1922   An access-type of FTP or TFTP indicates that the message body is
 
1923   accessible as a file using the FTP [RFC-959] or TFTP [RFC- 783]
 
1924   protocols, respectively.  For these access-types, the following
 
1925   additional parameters are mandatory:
 
1927    (1)   NAME -- The name of the file that contains the actual
 
1930    (2)   SITE -- A machine from which the file may be obtained,
 
1931          using the given protocol.  This must be a fully
 
1932          qualified domain name, not a nickname.
 
1934    (3)   Before any data are retrieved, using FTP, the user will
 
1935          generally need to be asked to provide a login id and a
 
1936          password for the machine named by the site parameter.
 
1937          For security reasons, such an id and password are not
 
1938          specified as content-type parameters, but must be
 
1939          obtained from the user.
 
1941   In addition, the following parameters are optional:
 
1943    (1)   DIRECTORY -- A directory from which the data named by
 
1944          NAME should be retrieved.
 
1946    (2)   MODE -- A case-insensitive string indicating the mode
 
1947          to be used when retrieving the information.  The valid
 
1948          values for access-type "TFTP" are "NETASCII", "OCTET",
 
1949          and "MAIL", as specified by the TFTP protocol [RFC-
 
1950          783].  The valid values for access-type "FTP" are
 
1951          "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a
 
1952          decimal integer, typically 8.  These correspond to the
 
1953          representation types "A" "E" "I" and "L n" as specified
 
1954          by the FTP protocol [RFC-959].  Note that "BINARY" and
 
1955          "TENEX" are not valid values for MODE and that "OCTET"
 
1956          or "IMAGE" or "LOCAL8" should be used instead.  IF MODE
 
1957          is not specified, the  default value is "NETASCII" for
 
1958          TFTP and "ASCII" otherwise.
 
1962Freed & Borenstein          Standards Track                    [Page 35]
 
1964RFC 2046                      Media Types                  November 1996
 
19675.2.3.3.  The 'anon-ftp' Access-Type
 
1969   The "anon-ftp" access-type is identical to the "ftp" access type,
 
1970   except that the user need not be asked to provide a name and password
 
1971   for the specified site.  Instead, the ftp protocol will be used with
 
1972   login "anonymous" and a password that corresponds to the user's mail
 
19755.2.3.4.  The 'local-file' Access-Type
 
1977   An access-type of "local-file" indicates that the actual body is
 
1978   accessible as a file on the local machine.  Two additional parameters
 
1979   are defined for this access type:
 
1981    (1)   NAME -- The name of the file that contains the actual
 
1982          body data.  This parameter is mandatory for the
 
1983          "local-file" access-type.
 
1985    (2)   SITE -- A domain specifier for a machine or set of
 
1986          machines that are known to have access to the data
 
1987          file.  This optional parameter is used to describe the
 
1988          locality of reference for the data, that is, the site
 
1989          or sites at which the file is expected to be visible.
 
1990          Asterisks may be used for wildcard matching to a part
 
1991          of a domain name, such as "*.bellcore.com", to indicate
 
1992          a set of machines on which the data should be directly
 
1993          visible, while a single asterisk may be used to
 
1994          indicate a file that is expected to be universally
 
1995          available, e.g., via a global file system.
 
19975.2.3.5.  The 'mail-server' Access-Type
 
1999   The "mail-server" access-type indicates that the actual body is
 
2000   available from a mail server.  Two additional parameters are defined
 
2001   for this access-type:
 
2003    (1)   SERVER -- The addr-spec of the mail server from which
 
2004          the actual body data can be obtained.  This parameter
 
2005          is mandatory for the "mail-server" access-type.
 
2007    (2)   SUBJECT -- The subject that is to be used in the mail
 
2008          that is sent to obtain the data.  Note that keying mail
 
2009          servers on Subject lines is NOT recommended, but such
 
2010          mail servers are known to exist.  This is an optional
 
2018Freed & Borenstein          Standards Track                    [Page 36]
 
2020RFC 2046                      Media Types                  November 1996
 
2023   Because mail servers accept a variety of syntaxes, some of which is
 
2024   multiline, the full command to be sent to a mail server is not
 
2025   included as a parameter in the content-type header field.  Instead,
 
2026   it is provided as the "phantom body" when the media type is
 
2027   "message/external-body" and the access-type is mail-server.
 
2029   Note that MIME does not define a mail server syntax.  Rather, it
 
2030   allows the inclusion of arbitrary mail server commands in the phantom
 
2031   body.  Implementations must include the phantom body in the body of
 
2032   the message it sends to the mail server address to retrieve the
 
2035   Unlike other access-types, mail-server access is asynchronous and
 
2036   will happen at an unpredictable time in the future.  For this reason,
 
2037   it is important that there be a mechanism by which the returned data
 
2038   can be matched up with the original "message/external-body" entity.
 
2039   MIME mail servers must use the same Content-ID field on the returned
 
2040   message that was used in the original "message/external-body"
 
2041   entities, to facilitate such matching.
 
20435.2.3.6.  External-Body Security Issues
 
2045   "Message/external-body" entities give rise to two important security
 
2048    (1)   Accessing data via a "message/external-body" reference
 
2049          effectively results in the message recipient performing
 
2050          an operation that was specified by the message
 
2051          originator.  It is therefore possible for the message
 
2052          originator to trick a recipient into doing something
 
2053          they would not have done otherwise.  For example, an
 
2054          originator could specify a action that attempts
 
2055          retrieval of material that the recipient is not
 
2056          authorized to obtain, causing the recipient to
 
2057          unwittingly violate some security policy.  For this
 
2058          reason, user agents capable of resolving external
 
2059          references must always take steps to describe the
 
2060          action they are to take to the recipient and ask for
 
2061          explicit permisssion prior to performing it.
 
2063          The 'mail-server' access-type is particularly
 
2064          vulnerable, in that it causes the recipient to send a
 
2065          new message whose contents are specified by the
 
2066          original message's originator.  Given the potential for
 
2067          abuse, any such request messages that are constructed
 
2068          should contain a clear indication that they were
 
2069          generated automatically (e.g. in a Comments: header
 
2070          field) in an attempt to resolve a MIME
 
2074Freed & Borenstein          Standards Track                    [Page 37]
 
2076RFC 2046                      Media Types                  November 1996
 
2079          "message/external-body" reference.
 
2081    (2)   MIME will sometimes be used in environments that
 
2082          provide some guarantee of message integrity and
 
2083          authenticity.  If present, such guarantees may apply
 
2084          only to the actual direct content of messages -- they
 
2085          may or may not apply to data accessed through MIME's
 
2086          "message/external-body" mechanism.  In particular, it
 
2087          may be possible to subvert certain access mechanisms
 
2088          even when the messaging system itself is secure.
 
2090          It should be noted that this problem exists either with
 
2091          or without the availabilty of MIME mechanisms.  A
 
2092          casual reference to an FTP site containing a document
 
2093          in the text of a secure message brings up similar
 
2094          issues -- the only difference is that MIME provides for
 
2095          automatic retrieval of such material, and users may
 
2096          place unwarranted trust is such automatic retrieval
 
20995.2.3.7.  Examples and Further Explanations
 
2101   When the external-body mechanism is used in conjunction with the
 
2102   "multipart/alternative" media type it extends the functionality of
 
2103   "multipart/alternative" to include the case where the same entity is
 
2104   provided in the same format but via different accces mechanisms.
 
2105   When this is done the originator of the message must order the parts
 
2106   first in terms of preferred formats and then by preferred access
 
2107   mechanisms.  The recipient's viewer should then evaluate the list
 
2108   both in terms of format and access mechanisms.
 
2110   With the emerging possibility of very wide-area file systems, it
 
2111   becomes very hard to know in advance the set of machines where a file
 
2112   will and will not be accessible directly from the file system.
 
2113   Therefore it may make sense to provide both a file name, to be tried
 
2114   directly, and the name of one or more sites from which the file is
 
2115   known to be accessible.  An implementation can try to retrieve remote
 
2116   files using FTP or any other protocol, using anonymous file retrieval
 
2117   or prompting the user for the necessary name and password.  If an
 
2118   external body is accessible via multiple mechanisms, the sender may
 
2119   include multiple entities of type "message/external-body" within the
 
2120   body parts of an enclosing "multipart/alternative" entity.
 
2122   However, the external-body mechanism is not intended to be limited to
 
2123   file retrieval, as shown by the mail-server access-type.  Beyond
 
2124   this, one can imagine, for example, using a video server for external
 
2125   references to video clips.
 
2130Freed & Borenstein          Standards Track                    [Page 38]
 
2132RFC 2046                      Media Types                  November 1996
 
2135   The embedded message header fields which appear in the body of the
 
2136   "message/external-body" data must be used to declare the media type
 
2137   of the external body if it is anything other than plain US-ASCII
 
2138   text, since the external body does not have a header section to
 
2139   declare its type.  Similarly, any Content-transfer-encoding other
 
2140   than "7bit" must also be declared here.  Thus a complete
 
2141   "message/external-body" message, referring to an object in PostScript
 
2142   format, might look like this:
 
2149     Message-ID: <id1@host.com>
 
2150     Content-Type: multipart/alternative; boundary=42
 
2151     Content-ID: <id001@guppylake.bellcore.com>
 
2154     Content-Type: message/external-body; name="BodyFormats.ps";
 
2155                   site="thumper.bellcore.com"; mode="image";
 
2156                   access-type=ANON-FTP; directory="pub";
 
2157                   expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
 
2159     Content-type: application/postscript
 
2160     Content-ID: <id42@guppylake.bellcore.com>
 
2163     Content-Type: message/external-body; access-type=local-file;
 
2164                   name="/u/nsb/writing/rfcs/RFC-MIME.ps";
 
2165                   site="thumper.bellcore.com";
 
2166                   expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
 
2168     Content-type: application/postscript
 
2169     Content-ID: <id42@guppylake.bellcore.com>
 
2172     Content-Type: message/external-body;
 
2173                   access-type=mail-server
 
2174                   server="listserv@bogus.bitnet";
 
2175                   expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
 
2177     Content-type: application/postscript
 
2178     Content-ID: <id42@guppylake.bellcore.com>
 
2186Freed & Borenstein          Standards Track                    [Page 39]
 
2188RFC 2046                      Media Types                  November 1996
 
2191   Note that in the above examples, the default Content-transfer-
 
2192   encoding of "7bit" is assumed for the external postscript data.
 
2194   Like the "message/partial" type, the "message/external-body" media
 
2195   type is intended to be transparent, that is, to convey the data type
 
2196   in the external body rather than to convey a message with a body of
 
2197   that type.  Thus the headers on the outer and inner parts must be
 
2198   merged using the same rules as for "message/partial".  In particular,
 
2199   this means that the Content-type and Subject fields are overridden,
 
2200   but the From field is preserved.
 
2202   Note that since the external bodies are not transported along with
 
2203   the external body reference, they need not conform to transport
 
2204   limitations that apply to the reference itself. In particular,
 
2205   Internet mail transports may impose 7bit and line length limits, but
 
2206   these do not automatically apply to binary external body references.
 
2207   Thus a Content-Transfer-Encoding is not generally necessary, though
 
2210   Note that the body of a message of type "message/external-body" is
 
2211   governed by the basic syntax for an RFC 822 message.  In particular,
 
2212   anything before the first consecutive pair of CRLFs is header
 
2213   information, while anything after it is body information, which is
 
2214   ignored for most access-types.
 
22165.2.4.  Other Message Subtypes
 
2218   MIME implementations must in general treat unrecognized subtypes of
 
2219   "message" as being equivalent to "application/octet-stream".
 
2221   Future subtypes of "message" intended for use with email should be
 
2222   restricted to "7bit" encoding. A type other than "message" should be
 
2223   used if restriction to "7bit" is not possible.
 
22256.  Experimental Media Type Values
 
2227   A media type value beginning with the characters "X-" is a private
 
2228   value, to be used by consenting systems by mutual agreement.  Any
 
2229   format without a rigorous and public definition must be named with an
 
2230   "X-" prefix, and publicly specified values shall never begin with
 
2231   "X-".  (Older versions of the widely used Andrew system use the "X-
 
2232   BE2" name, so new systems should probably choose a different name.)
 
2234   In general, the use of "X-" top-level types is strongly discouraged.
 
2235   Implementors should invent subtypes of the existing types whenever
 
2236   possible. In many cases, a subtype of "application" will be more
 
2237   appropriate than a new top-level type.
 
2242Freed & Borenstein          Standards Track                    [Page 40]
 
2244RFC 2046                      Media Types                  November 1996
 
2249   The five discrete media types provide provide a standardized
 
2250   mechanism for tagging entities as "audio", "image", or several other
 
2251   kinds of data. The composite "multipart" and "message" media types
 
2252   allow mixing and hierarchical structuring of entities of different
 
2253   types in a single message. A distinguished parameter syntax allows
 
2254   further specification of data format details, particularly the
 
2255   specification of alternate character sets.  Additional optional
 
2256   header fields provide mechanisms for certain extensions deemed
 
2257   desirable by many implementors. Finally, a number of useful media
 
2258   types are defined for general use by consenting user agents, notably
 
2259   "message/partial" and "message/external-body".
 
22619.  Security Considerations
 
2263   Security issues are discussed in the context of the
 
2264   "application/postscript" type, the "message/external-body" type, and
 
2265   in RFC 2048.  Implementors should pay special attention to the
 
2266   security implications of any media types that can cause the remote
 
2267   execution of any actions in the recipient's environment.  In such
 
2268   cases, the discussion of the "application/postscript" type may serve
 
2269   as a model for considering other media types with remote execution
 
2298Freed & Borenstein          Standards Track                    [Page 41]
 
2300RFC 2046                      Media Types                  November 1996
 
23039.  Authors' Addresses
 
2305   For more information, the authors of this document are best contacted
 
2309   Innosoft International, Inc.
 
2310   1050 East Garvey Avenue South
 
2311   West Covina, CA 91790
 
2314   Phone: +1 818 919 3600
 
2315   Fax:   +1 818 919 3614
 
2316   EMail: ned@innosoft.com
 
2319   Nathaniel S. Borenstein
 
2320   First Virtual Holdings
 
2321   25 Washington Avenue
 
2322   Morristown, NJ 07960
 
2325   Phone: +1 201 540 8967
 
2326   Fax:   +1 201 993 3032
 
2327   EMail: nsb@nsb.fv.com
 
2330   MIME is a result of the work of the Internet Engineering Task Force
 
2331   Working Group on RFC 822 Extensions.  The chairman of that group,
 
2332   Greg Vaudreuil, may be reached at:
 
2334   Gregory M. Vaudreuil
 
2335   Octel Network Services
 
2336   17080 Dallas Parkway
 
2337   Dallas, TX 75248-1905
 
2340   EMail: Greg.Vaudreuil@Octel.Com
 
2354Freed & Borenstein          Standards Track                    [Page 42]
 
2356RFC 2046                      Media Types                  November 1996
 
2359Appendix A -- Collected Grammar
 
2361   This appendix contains the complete BNF grammar for all the syntax
 
2362   specified by this document.
 
2364   By itself, however, this grammar is incomplete.  It refers by name to
 
2365   several syntax rules that are defined by RFC 822.  Rather than
 
2366   reproduce those definitions here, and risk unintentional differences
 
2367   between the two, this document simply refers the reader to RFC 822
 
2368   for the remaining definitions. Wherever a term is undefined, it
 
2369   refers to the RFC 822 definition.
 
2371     boundary := 0*69<bchars> bcharsnospace
 
2373     bchars := bcharsnospace / " "
 
2375     bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
 
2376                      "+" / "_" / "," / "-" / "." /
 
2377                      "/" / ":" / "=" / "?"
 
2379     body-part := <"message" as defined in RFC 822, with all
 
2380                   header fields optional, not starting with the
 
2381                   specified dash-boundary, and with the
 
2382                   delimiter not occurring anywhere in the
 
2383                   body part.  Note that the semantics of a
 
2384                   part differ from the semantics of a message,
 
2385                   as described in the text.>
 
2387     close-delimiter := delimiter "--"
 
2389     dash-boundary := "--" boundary
 
2390                      ; boundary taken from the value of
 
2391                      ; boundary parameter of the
 
2392                      ; Content-Type field.
 
2394     delimiter := CRLF dash-boundary
 
2396     discard-text := *(*text CRLF)
 
2397                     ; May be ignored or discarded.
 
2399     encapsulation := delimiter transport-padding
 
2402     epilogue := discard-text
 
2404     multipart-body := [preamble CRLF]
 
2405                       dash-boundary transport-padding CRLF
 
2406                       body-part *encapsulation
 
2410Freed & Borenstein          Standards Track                    [Page 43]
 
2412RFC 2046                      Media Types                  November 1996
 
2415                       close-delimiter transport-padding
 
2418     preamble := discard-text
 
2420     transport-padding := *LWSP-char
 
2421                          ; Composers MUST NOT generate
 
2422                          ; non-zero length transport
 
2423                          ; padding, but receivers MUST
 
2424                          ; be able to handle padding
 
2425                          ; added by message transports.
 
2466Freed & Borenstein          Standards Track                    [Page 44]