PSGnuwin32

1.0.0

man/cat1p/tr.1p.txt

                                tr(P)                                                     tr(P)

NAME

       tr - translate characters

SYNOPSIS

       tr [-c | -C][-s] string1 string2

       tr -s [-c | -C] string1

       tr -d [-c | -C] string1

       tr -ds [-c | -C] string1 string2

DESCRIPTION

       The  tr  utility  shall  copy  the standard input to the

       standard  output  with  substitution  or   deletion   of

       selected  characters.   The  options  specified  and the

       string1 and string2 operands shall control  translations

       that occur while copying characters and single-character

       collating elements.

OPTIONS

       The tr utility shall conform  to  the  Base  Definitions

       volume  of  IEEE Std 1003.1-2001,  Section 12.2, Utility

       Syntax Guidelines.

       The following options shall be supported:

       -c     Complement  the  set  of  values   specified   by

              string1. See the EXTENDED DESCRIPTION section.

       -C     Complement  the  set  of  characters specified by

              string1. See the EXTENDED DESCRIPTION section.

       -d     Delete all occurrences of input  characters  that

              are specified by string1.

       -s     Replace  instances  of repeated characters with a

              single character, as described  in  the  EXTENDED

              DESCRIPTION section.

OPERANDS

       The following operands shall be supported:

       string1, string2

              Translation  control  strings.  Each string shall

              represent a set of  characters  to  be  converted

              into an array of characters used for the transla-

              tion. For  a  detailed  description  of  how  the

              strings   are   interpreted,   see  the  EXTENDED

              DESCRIPTION section.

STDIN

       The standard input can be any type of file.

INPUT FILES

       None.

ENVIRONMENT VARIABLES

       The following environment  variables  shall  affect  the

       execution of tr:

       LANG   Provide a default value for the internationaliza-

              tion variables that are unset or null.  (See  the

              Base  Definitions volume of IEEE Std 1003.1-2001,

              Section 8.2, Internationalization  Variables  for

              the  precedence of internationalization variables

              used to determine  the  values  of  locale  cate-

              gories.)

       LC_ALL If  set to a non-empty string value, override the

              values  of  all  the  other  internationalization

              variables.

       LC_COLLATE

              Determine  the  locale  for the behavior of range

              expressions and equivalence classes.

       LC_CTYPE

              Determine the locale for  the  interpretation  of

              sequences  of  bytes  of  text data as characters

              (for example, single-byte as  opposed  to  multi-

              byte characters in arguments) and the behavior of

              character classes.

       LC_MESSAGES

              Determine the  locale  that  should  be  used  to

              affect the format and contents of diagnostic mes-

              sages written to standard error.

       NLSPATH

              Determine the location of  message  catalogs  for

              the processing of LC_MESSAGES .

ASYNCHRONOUS EVENTS

       Default.

STDOUT

       The  tr output shall be identical to the input, with the

       exception of the specified transformations.

STDERR

       The standard error shall be  used  only  for  diagnostic

       messages.

OUTPUT FILES

       None.

EXTENDED DESCRIPTION

       The  operands  string1 and string2 (if specified) define

       two arrays of characters. The constructs in the  follow-

       ing  list  can  be used to specify characters or single-

       character collating elements. If any of  the  constructs

       result  in  multi-character collating elements, tr shall

       exclude, without  a  diagnostic,  those  multi-character

       elements from the resulting array.

       character

              Any character not described by one of the conven-

              tions below shall represent itself.

       \octal Octal sequences can be used to represent  charac-

              ters   with   specific  coded  values.  An  octal

              sequence shall consist of a backslash followed by

              the longest sequence of one, two, or three-octal-

              digit characters (01234567). The  sequence  shall

              cause  the value whose encoding is represented by

              the one, two, or three-digit octal integer to  be

              placed  into  the array. If the size of a byte on

              the system is greater than nine bits,  the  valid

              escape  sequence  used  to  represent  a  byte is

              implementation-defined.   Multi-byte   characters

              require  multiple,  concatenated escape sequences

              of this type, including the leading '\' for  each

              byte.

       \character

              The  backslash-escape sequences in the Base Defi-

              nitions  volume  of  IEEE Std 1003.1-2001,  Table

              5-1,  Escape  Sequences  and Associated Actions (

              '\\' , '\a' , '\b' , '\f' , '\n' , '\r' , '\t'  ,

              '\v'  )  shall be supported. The results of using

              any other character, other than an  octal  digit,

              following the backslash are unspecified.

       c-c    In  the POSIX locale, this construct shall repre-

              sent the range of collating elements between  the

              range  endpoints  (as long as neither endpoint is

              an octal sequence of the form \octal), inclusive,

              as defined by the collation sequence. The charac-

              ters or collating elements in the range shall  be

              placed   in  the  array  in  ascending  collation

              sequence. If the  second  endpoint  precedes  the

              starting  endpoint  in the collation sequence, it

              is unspecified whether  the  range  of  collating

              elements  is  empty, or this construct is treated

              as invalid.  In  locales  other  than  the  POSIX

              locale,  this construct has unspecified behavior.

       If either or both  of  the  range  endpoints  are  octal

       sequences  of  the form \octal, this shall represent the

       range of specific coded values  between  the  two  range

       endpoints, inclusive.

       :class:

              Represents   all   characters  belonging  to  the

              defined character class, as defined by  the  cur-

              rent setting of the LC_CTYPE locale category. The

              following character class names shall be accepted

              when specified in string1:

      alnum     blank     digit     lower     punct     upper

      alpha     cntrl     graph     print     space     xdigit

       In  addition, character class expressions of the form [:

       name:] shall be recognized in those  locales  where  the

       name  keyword  has  been given a charclass definition in

       the LC_CTYPE category.

       When both the -d and -s options are  specified,  any  of

       the  character class names shall be accepted in string2.

       Otherwise, only character class names lower or upper are

       valid  in  string2  and  then  only if the corresponding

       character class (  upper  and  lower,  respectively)  is

       specified in the same relative position in string1. Such

       a specification shall be interpreted as  a  request  for

       case  conversion. When [: lower:] appears in string1 and

       [: upper:] appears in string2, the arrays shall  contain

       the  characters from the toupper mapping in the LC_CTYPE

       category of the current locale. When [: upper:]  appears

       in string1 and [: lower:] appears in string2, the arrays

       shall contain the characters from the tolower mapping in

       the  LC_CTYPE  category of the current locale. The first

       character from each mapping pair shall be in  the  array

       for  string1  and the second character from each mapping

       pair shall be in the array for string2 in the same rela-

       tive position.

       Except  for case conversion, the characters specified by

       a character class expression  shall  be  placed  in  the

       array in an unspecified order.

       If  the name specified for class does not define a valid

       character class in the current locale, the  behavior  is

       undefined.

       =equiv=

              Represents  all  characters or collating elements

              belonging to the same equivalence class as equiv,

              as  defined by the current setting of the LC_COL-

              LATE  locale  category.  An   equivalence   class

              expression  shall  be allowed only in string1, or

              in string2 when it is being used by the  combined

              -d  and  -s  options. The characters belonging to

              the equivalence class  shall  be  placed  in  the

              array in an unspecified order.

       x*n    Represents  n repeated occurrences of the charac-

              ter x.  Because this expression is  used  to  map

              multiple characters to one, it is only valid when

              it occurs in string2. If n is omitted or is zero,

              it shall be interpreted as large enough to extend

              the string2-based sequence to the length  of  the

              string1-based  sequence. If n has a leading zero,

              it shall be interpreted as an octal value. Other-

              wise, it shall be interpreted as a decimal value.

       When the -d option is not specified:

              Each input character found in the array specified

              by  string1 shall be replaced by the character in

              the same relative position in the array specified

              by  string2.  When the array specified by string2

              is shorter that the one specified by string1, the

              results are unspecified.

              If the -C option is specified, the complements of

              the characters specified by string1 (the  set  of

              all  characters  in the current character set, as

              defined by the  current  setting  of  LC_CTYPE  ,

              except   for  those  actually  specified  in  the

              string1 operand) shall be placed in the array  in

              ascending  collation  sequence, as defined by the

              current setting of LC_COLLATE .

              If the -c option is specified, the complement  of

              the  values  specified by string1 shall be placed

              in the array in ascending order by binary  value.

              Because  the  order in which characters specified

              by character  class  expressions  or  equivalence

              class  expressions is undefined, such expressions

              should only be used if the intent is to map  sev-

              eral  characters  into  one. An exception is case

              conversion, as described previously.

       When the -d option is specified:

              Input characters found in the array specified  by

              string1 shall be deleted.

              When  the  -C  option  is  specified with -d, all

              characters  except  those  specified  by  string1

              shall  be  deleted.   The contents of string2 are

              ignored, unless the -s option is also  specified.

              When the -c option is specified with -d, all val-

              ues except those specified by  string1  shall  be

              deleted.   The   contents  of  string2  shall  be

              ignored, unless the -s option is also  specified.

              The  same  string  cannot be used for both the -d

              and the -s option; when both options  are  speci-

              fied,   both  string1  (used  for  deletion)  and

              string2 (used for squeezing) shall be required.

       When the -s option is specified, after any deletions  or

       translations have taken place, repeated sequences of the

       same character shall be replaced by  one  occurrence  of

       the  same  character,  if  the character is found in the

       array specified by the last operand. If the last operand

       contains  a character class, such as the following exam-

       ple:

              tr -s '[:space:]'

       the last operand's array shall contain all of the  char-

       acters  in that character class. However, in a case con-

       version, as described previously, such as:

              tr -s '[:upper:]' '[:lower:]'

       the last operand's array shall contain only those  char-

       acters  defined  as the second characters in each of the

       toupper or tolower character pairs, as appropriate.

       An empty string used for  string1  or  string2  produces

       undefined results.

EXIT STATUS

       The following exit values shall be returned:

        0     All input was processed successfully.

       >0     An error occurred.

CONSEQUENCES OF ERRORS

       Default.

       The following sections are informative.

APPLICATION USAGE

       If necessary, string1 and string2 can be quoted to avoid

       pattern matching by the shell.

       If an ordinary digit (representing itself) is to  follow

       an  octal sequence, the octal sequence must use the full

       three digits to avoid ambiguity.

       When string2  is  shorter  than  string1,  a  difference

       results  between  historical System V and BSD systems. A

       BSD system pads string2 with the last character found in

       string2.  Thus, it is possible to do the following:

              tr 0123456789 d

       which  would  translate  all  digits to the letter 'd' .

       Since this area is specifically unspecified in this vol-

       ume  of  IEEE Std 1003.1-2001, both the BSD and System V

       behaviors are allowed, but a conforming application can-

       not  rely on the BSD behavior. It would have to code the

       example in the following way:

              tr 0123456789 '[d*]'

       It should be noted that, despite similarities in appear-

       ance,  the  string  operands  used by tr are not regular

       expressions.

       Unlike some historical implementations, this  definition

       of  the tr utility correctly processes NUL characters in

       its input stream. NUL  characters  can  be  stripped  by

       using:

              tr -d '\000'

EXAMPLES

       The  following  example  creates  a list of all words in

       file1 one per line in file2, where a word is taken to be

       a maximal string of letters.

              tr -cs "[:alpha:]" "[\n*]" <file1 >file2

       The  next example translates all lowercase characters in

       file1 to uppercase and writes the  results  to  standard

       output.

              tr "[:lower:]" "[:upper:]" <file1

       This  example  uses  an  equivalence  class  to identify

       accented variants of the base character  'e'  in  file1,

       which  are  stripped of diacritical marks and written to

       file2.

              tr "[=e=]" e <file1 >file2

RATIONALE

       In some early proposals, an explicit option -n was added

       to  disable  the  historical  behavior  of stripping NUL

       characters from the input. It was considered that  auto-

       matically  stripping  NUL  characters from the input was

       not correct functionality.  However, the removal  of  -n

       in a later proposal does not remove the requirement that

       tr correctly process NUL characters in its input stream.

       NUL characters can be stripped by using tr -d '\000'.

       Historical implementations of tr differ widely in syntax

       and behavior. For  example,  the  BSD  version  has  not

       needed   the   bracket  characters  for  the  repetition

       sequence. The tr utility syntax is based more closely on

       the System V and XPG3 model while attempting to accommo-

       date historical BSD implementations. In the case of  the

       short string2 padding, the decision was to unspecify the

       behavior and preserve System V and XPG3  scripts,  which

       might  find  difficulty with the BSD method. The assump-

       tion was made that BSD users of tr have to make accommo-

       dations  to  meet  the  syntax defined here. Since it is

       possible to use the repetition sequence to duplicate the

       desired  behavior,  whereas  there  is  no simple way to

       achieve the System V method, this was  the  correct,  if

       not desirable, approach.

       The  use  of octal values to specify control characters,

       while having historical precedents, is not portable. The

       introduction  of escape sequences for control characters

       should provide the necessary portability. It  is  recog-

       nized  that  this  may  cause some historical scripts to

       break.

       An early proposal included support  for  multi-character

       collating  elements.   It was pointed out that, while tr

       does employ some syntactical elements from REs, the  aim

       of  tr  is  quite different; ranges, for example, do not

       have a similar meaning (``any of the chars in the  range

       matches",  versus "translate each character in the range

       to the output counterpart"). As a result, the previously

       included  support for multi-character collating elements

       has been removed. What remains  are  ranges  in  current

       collation order (to support, for example, accented char-

       acters), character classes, and equivalence classes.

       In XPG3 the [: class:] and [=  equiv=]  conventions  are

       shown with double brackets, as in RE syntax. However, tr

       does not implement RE principles; it just  borrows  part

       of  the  syntax. Consequently, [: class:] and [= equiv=]

       should be regarded as syntactical elements on a par with

       [ x* n], which is not an RE bracket expression.

       The standard developers will consider changes to tr that

       allow it to translate characters between different char-

       acter  encodings,  or they will consider providing a new

       utility to accomplish this.

       On historical  System  V  systems,  a  range  expression

       requires enclosing square-brackets, such as:

              tr '[a-z]' '[A-Z]'

       However, BSD-based systems did not require the brackets,

       and this convention is used here to avoid breaking large

       numbers of BSD scripts:

              tr a-z A-Z

       The  preceding  System  V  script  will continue to work

       because the brackets, treated as regular characters, are

       translated  to  themselves. However, any System V script

       that relied on "a-z" representing the  three  characters

       'a' , '-' , and 'z' have to be rewritten as "az-" .

       The  ISO POSIX-2:1993  standard  had  a  -c  option that

       behaved similarly to the -C option, but did  not  supply

       functionality  equivalent  to the -c option specified in

       IEEE Std 1003.1-2001.  This meant that historical  prac-

       tice  of  being  able  to  specify tr -d\200-\377 (which

       would delete all bytes with the top bit set) would  have

       no  effect because, in the C locale, bytes with the val-

       ues octal 200 to octal 377 are not characters.

       The earlier  version  also  said  that  octal  sequences

       referred to collating elements and could be placed adja-

       cent to each other  to  specify  multi-byte  characters.

       However,  it  was  noted  that  this  caused ambiguities

       because tr would not be able to  tell  whether  adjacent

       octal  sequences  were  intending  to specify multi-byte

       characters   or   multiple   single   byte   characters.

       IEEE Std 1003.1-2001   specifies  that  octal  sequences

       always refer to single byte binary values.

FUTURE DIRECTIONS

       None.

SEE ALSO

       sed

COPYRIGHT

       Portions of this text are reprinted  and  reproduced  in

       electronic  form  from  IEEE  Std  1003.1, 2003 Edition,

       Standard for Information Technology -- Portable  Operat-

       ing System Interface (POSIX), The Open Group Base Speci-

       fications Issue 6, Copyright (C) 2001-2003 by the Insti-

       tute  of  Electrical  and Electronics Engineers, Inc and

       The Open Group. In the event of any discrepancy  between

       this  version  and  the original IEEE and The Open Group

       Standard, the original IEEE and The Open Group  Standard

       is  the  referee  document. The original Standard can be

       obtained        online        at        http://www.open-

       group.org/unix/online.html .

POSIX                         2003                        tr(P)