String Methods
You have already encountered methods in lists. Strings have a much richer set of methods, in part because strings have “inherited” many of their methods from the string module where they resided as functions in earlier versions of Python (and where you may still find them, if you feel the need).
Because there are so many string methods, only some of the most useful ones are described here. For a full reference, see Appendix B. In the description of the string methods, you will find references to other, related string methods in this chapter (marked “See also”) or in Appendix B. find The find method finds a substring within a larger string. It returns the leftmost index where the substring is found. If it is not found, –1 is returned:
>>> 'With a moo-moo here, and a moo-moo there'.find('moo')
7
>>> title = "Monty Python's Flying Circus"
>>> title.find('Monty')
0
2. For a more thorough description of the module, check out Section 4.1 of the Python Library Reference
(http://python.org/doc/lib/module-string.html).
3. In Python 3.0, string.letters and friends will be removed. You will need to use constants like
string.ascii_letters instead.
BUT STRING ISN’T DEAD
Even though string methods have completely upstaged the string module, the module still includes a few
constants and functions that aren’t available as string methods. The maketrans function is one example and
will be discussed together with the translate method in the material that follows. The following are some
useful constants available from string.2
• string.digits: A string containing the digits 0–9
• string.letters: A string containing all letters (uppercase and lowercase)
• string.lowercase: A string containing all lowercase letters
• string.printable: A string containing all printable characters
• string.punctuation: A string containing all punctuation characters
• string.uppercase: A string containing all uppercase letters
Note that the string constant letters (such as string.letters) are locale-dependent (that is, their
exact values depend on the language for which Python is configured).3 If you want to make sure you’re using
ASCII, you can use the variants with ascii_ in their names, such as string.ascii_letters.
find
The find method finds a substring within a larger string. It returns the leftmost index where the
substring is found. If it is not found, –1 is returned:
>>> 'With a moo-moo here, and a moo-moo there'.find('moo')
7
>>> title = "Monty Python's Flying Circus"
>>> title.find('Monty')
0
2. For a more thorough description of the module, check out Section 4.1 of the Python Library Reference
(http://python.org/doc/lib/module-string.html).
3. In Python 3.0, string.letters and friends will be removed. You will need to use constants like
string.ascii_letters instead.
>>> title.find('Python')
6
>>> title.find('Flying')
15
>>> title.find('Zirquss')
-1
In our first encounter with membership in Chapter 2, we created part of a spam filter by
using the expression '$$$' in subject. We could also have used find (which would also have
worked prior to Python 2.3, when in could be used only when checking for single character
membership in strings):
>>> subject = '$$$ Get rich now!!! $$$'
>>> subject.find('$$$')
0
¦Note The string method find does not return a Boolean value. If find returns 0, as it did here, it means
that it has found the substring, at index zero.
You may also supply a starting point for your search and, optionally, an ending point:
>>> subject = '$$$ Get rich now!!! $$$'
>>> subject.find('$$$')
0
>>> subject.find('$$$', 1) # Only supplying the start
20
>>> subject.find('!!!')
16
>>> subject.find('!!!', 0, 16) # Supplying start and end
-1
Note that the range specified by the start and stop values (second and third parameters)
includes the first index but not the second. This is common practice in Python.
In Appendix B: rfind, index, rindex, count, startswith, endswith.
join
A very important string method, join is the inverse of split. It is used to join the elements of a
sequence:
>>> seq = [1, 2, 3, 4, 5]
>>> sep = '+'
>>> sep.join(seq) # Trying to join a list of numbers
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sequence item 0: expected string, int found
62 CHAPTER 3 ¦ WORKING WITH STRINGS
>>> seq = ['1', '2', '3', '4', '5']
>>> sep.join(seq) # Joining a list of strings
'1+2+3+4+5'
>>> dirs = '', 'usr', 'bin', 'env'
>>> '/'.join(dirs)
'/usr/bin/env'
>>> print 'C:' + '\\'.join(dirs)
C:\usr\bin\env
As you can see, the sequence elements that are to be joined must all be strings. Note how
in the last two examples I use a list of directories and format them according to the conventions
of UNIX and DOS/Windows simply by using a different separator (and adding a drive name in
the DOS version).
See also: split.
lower
The lower method returns a lowercase version of the string:
>>> 'Trondheim Hammer Dance'.lower()
'trondheim hammer dance'
This can be useful if you want to write code that is case insensitive—that is, code that
ignores the difference between uppercase and lowercase letters. For instance, suppose you
want to check whether a user name is found in a list. If your list contains the string 'gumby' and
the user enters his name as 'Gumby', you won’t find it:
>>> if 'Gumby' in ['gumby', 'smith', 'jones']: print 'Found it!'
...
>>>
Of course, the same thing will happen if you have stored 'Gumby' and the user writes
'gumby', or even 'GUMBY'. A solution to this is to convert all names to lowercase both when storing
and searching. The code would look something like this:
>>> name = 'Gumby'
>>> names = ['gumby', 'smith', 'jones']
>>> if name.lower() in names: print 'Found it!'
...
Found it!
>>>
See also: translate.
In Appendix B: islower, capitalize, swapcase, title, istitle, upper, isupper.
TITLE CASING
One relative of lower is the title method (see Appendix B), which title cases a string—that is, all words
start with uppercase characters, and all other characters are lowercased. However, the word boundaries are
defined in a way that may give some unnatural results:
>>> "that's all folks".title()
"That'S All, Folks"
An alternative is the capwords function from the string module:
>>> import string
>>> string.capwords("that's all, folks")
"That's All, Folks"
Of course, if you want a truly correctly capitalized title (which depends on the style you’re using—possibly
lowercasing articles, coordinating conjunctions, prepositions with fewer than five letters, and so forth),
you’re basically on your own.
replace
The replace method returns a string where all the occurrences of one string have been
replaced by another:
>>> 'This is a test'.replace('is', 'eez')
'Theez eez a test'
If you have ever used the “search and replace” feature of a word processing program, you
will no doubt see the usefulness of this method.
See also: translate.
In Appendix B: expandtabs.
split
A very important string method, split is the inverse of join, and is used to split a string into a
sequence:
>>> '1+2+3+4+5'.split('+')
['1', '2', '3', '4', '5']
>>> '/usr/bin/env'.split('/')
['', 'usr', 'bin', 'env']
>>> 'Using the default'.split()
['Using', 'the', 'default']
Note that if no separator is supplied, the default is to split on all runs of consecutive
whitespace characters (spaces, tabs, newlines, and so on).
See also: join.
In Appendix B: rsplit, splitlines.
strip
The strip method returns a string where whitespace on the left and right (but not internally)
has been stripped (removed):
>>> ' internal whitespace is kept '.strip()
'internal whitespace is kept'
As with lower, strip can be useful when comparing input to stored values. Let’s return to
the user name example from the section on lower, and let’s say that the user inadvertently
types a space after his name:
>>> names = ['gumby', 'smith', 'jones']
>>> name = 'gumby '
>>> if name in names: print 'Found it!'
...
>>> if name.strip() in names: print 'Found it!'
...
Found it!
>>>
You can also specify which characters are to be stripped, by listing them all in a string
parameter:
>>> '*** SPAM * for * everyone!!! ***'.strip(' *!')
'SPAM * for * everyone'
Stripping is performed only at the ends, so the internal asterisks are not removed.
0 comments:
Post a Comment