# 一、前言

在字符串数据处理的过程中，正则表达式是我们经常使用到的，python 中使用的则是 re 模块。下面会通过实际案例介绍 re.sub() 的详细用法，该函数主要用于替换字符串中的匹配项。

# 二、函数原型

首先从源代码来看一下该函数原型，包括各个参数及其意义：

	def sub(pattern, repl, string, count=0, flags=0):
	"""Return the string obtained by replacing the leftmost
	non-overlapping occurrences of the pattern in string by the
	replacement repl. repl can be either a string or a callable;
	if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return
	a replacement string to be used."""
	return _compile(pattern, flags).sub(repl, string, count)

从上面的代码中可以看到 re.sub() 方法中含有 5 个参数，下面进行一一说明（加粗的为必须参数）：

pattern：该参数表示正则中的模式字符串；
repl：该参数表示要替换的字符串（即匹配到 pattern 后替换为 repl），也可以是个函数；
string：该参数表示要被处理（查找替换）的原始字符串；
count：可选参数，表示是要替换的最大次数，而且必须是非负整数，该参数默认为 0，即所有的匹配都会被替换；
flags：可选参数，表示编译时用的匹配模式（如忽略大小写、多行模式等），数字形式，默认为 0。

# 三、使用案例

下面将以一个字符串（包含大小写英文、数字、中英文标点、特殊符号等）作为示例进行使用案例讲解，该字符串如下：

>>> s = "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"

# 1. 匹配单个数字或字母

（1）只匹配单一数字

	>>> import re
	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[0-9]', '*', s)
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m years old. Today is //. It is a wonderful DAY! @HHHHello,,,#ComeHere...？AA？zz？——http://welcome.cn"

上面 re.sub(r'[0-9]', '*', s) 这句话则表示只匹配单一数字，并将每一个数字替换为一个星号 。

（2）只匹配单一字母

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"

	>>> re.sub(r'[a-z]', '*', s)
	"大家好，我是一个程序员小白。I '* ******* **, * I’* 18 *** . T* 2020/01/01. I* ** * ******* DAY! @HHHH,,,#111CH222...66？AA？？——**://***."

	>>> re.sub(r'[A-Z]', '*', s)
	"大家好，我是一个程序员小白。* 'm so glad to introduce myself, and ’m 18 years old. oday is 2020/01/01. t is a wonderful ! @*ello,,,#111omeere222...66？*？zz？——http://welcome.cn"

	>>> re.sub(r'[A-Za-z]', '*', s)
	"大家好，我是一个程序员小白。* '* ******* **, * ’ 18 *** . ** 2020/01/01. * ******* ! @*****,,,#111****222...66？？？——://***."

上面 re.sub(r'[a-z]', '*', s) 这句话则表示只匹配单一小写字母，并将每一个小写字母替换为一个星号 。上面 re.sub(r'[A-Z]', '*', s) 这句话则表示只匹配单一大写字母，并将每一个大写字母替换为一个星号 。上面 re.sub(r'[A-Za-z]', '*', s) 这句话则表示只匹配单一字母，并将每一个字母替换为一个星号 。

（3）匹配单一数字和字母

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"

	>>> re.sub(r'[0-9A-Z]', '*', s)
	"大家好，我是一个程序员小白。* 'm so glad to introduce myself, and ’m * years old. oday is *//*. t is a wonderful *! @ello,,,#*omeere*...？**？zz？——http://welcome.cn"

	>>> re.sub(r'[0-9a-z]', '*', s)
	"大家好，我是一个程序员小白。I '* ******* **, * I’* * . T* **//*. I ** * ******* DAY! @HHHH,,,#CH**...？AA？？——://***."

	>>> re.sub(r'[0-9A-Za-z]', '*', s)
	"大家好，我是一个程序员小白。* '* ******* **, * ’ * . ** **//. ** * ******* ! @*****,,,#**********...？？？——**://***."

上面 re.sub(r'[0-9A-Z]', '*', s) 这句话则表示只匹配单一数字和大写字母，并将每一个数字和大写字母替换为一个星号 。上面 re.sub(r'[0-9a-z]', '*', s) 这句话则表示只匹配单一数字和小写字母，并将每一个数字和小写字母替换为一个星号 。上面 re.sub(r'[0-9A-Za-z]', '*', s) 这句话则表示只匹配单一数字和字母，并将每一个数字和字母替换为一个星号 。

# 2. 匹配多个数字或字母

注意：这里的所说的多个指的是大于等于一个。

（1）匹配多个数字

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[0-9]+', '*', s)
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m * years old. Today is //. It is a wonderful DAY! @HHHHello,,,#ComeHere...？AA？zz？——http://welcome.cn"

上面 re.sub(r'[0-9]+', '*', s) 这句话则表示匹配多个连续的数字，并将多个连续的数字替换为一个星号 。

（2）匹配多个字母

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[a-z]+', '*', s)
	"大家好，我是一个程序员小白。I '* * * * * , I’* 18 * . T * 2020/01/01. I* * * * DAY! @HHHH,,,#111CH222...66？AA？？——://.*"
	>>> re.sub(r'[A-Z]+', '*', s)
	"大家好，我是一个程序员小白。* 'm so glad to introduce myself, and ’m 18 years old. oday is 2020/01/01. t is a wonderful ! @ello,,,#111omeere222...66？？zz？——http://welcome.cn"
	>>> re.sub(r'[a-zA-Z]+', '*', s)
	"大家好，我是一个程序员小白。* '* * * * * , ’ 18 * . * 2020/01/01. * * * * ! @,,,#111222...66？？？——://."

上面 re.sub(r'[a-z]+', '*', s) 这句话则表示匹配多个连续的小写字母，并将多个连续的小写字母替换为一个星号 。上面 re.sub(r'[A-Z]+', '*', s) 这句话则表示匹配多个连续的大写字母，并将多个连续的大写字母替换为一个星号 。上面 re.sub(r'[A-Za-z]+', '*', s) 这句话则表示匹配多个连续的字母，并将多个连续的字母替换为一个星号 。

（3）匹配多个数字和字母

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[0-9a-zA-Z]+', '*', s)
	"大家好，我是一个程序员小白。* '* * * * * , ’ * * . * //. * * * ! @,,,#...？？？——://.*"

上面 re.sub(r'[0-9A-Za-z]+', '*', s) 这句话则表示匹配多个连续的数字和字母，并将多个连续的数字、连续的字母、连续的数字和字母替换为一个星号 。

# 3. 匹配其他

（1）匹配非数字

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[^0-9]', '*', s)
	'******************************************************18********************20200101***********************************111****22266*************************'
	>>> re.sub(r'[^0-9]+', '*', s)
	'182020010111122266'

上面 re.sub(r'[^0-9]', '*', s) 这句话则表示匹配单个非数字，并将单个非数字替换为一个星号 。上面 re.sub(r'[^0-9]+', '*', s) 这句话则表示匹配多个连续的非数字，并将多个连续的非数字替换为一个星号 。

（2）匹配非字母

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[^a-z]', '*', s)
	'****************msogladtointroducemyselfandm**yearsold****odayis*************tisawonderful*********ello*****omeere**********zzhttp*welcomecn'
	>>> re.sub(r'[^A-Z]', '*', s)
	'************I*********************************I***************T****************I*************DAYHHHH********CH*********AA*********************'
	>>> re.sub(r'[^A-Za-z]', '*', s)
	'************ImsogladtointroducemyselfandIm**yearsold***Todayis************ItisawonderfulDAYHHHHello***ComeHere******AAzz*httpwelcomecn'
	>>> re.sub(r'[^a-z]+', '*', s)
	'msogladtointroducemyselfandmyearsoldodayistisawonderfulelloomeerezzhttpwelcome*cn'
	>>> re.sub(r'[^A-Z]+', '*', s)
	'IITIDAYHHHHCHAA'
	>>> re.sub(r'[^A-Za-z]+', '*', s)
	'ImsogladtointroducemyselfandImyearsoldTodayisItisawonderfulDAYHHHHelloComeHereAAzzhttpwelcomecn'

上面 re.sub(r'[^a-z]', '*', s) 这句话则表示匹配单个非小写字母，并将单个非小写字母替换为一个星号 。上面 re.sub(r'[^A-Z]', '*', s) 这句话则表示匹配单个非大写字母，并将单个非大写字母替换为一个星号 。上面 re.sub(r'[^A-Za-z]', '*', s) 这句话则表示匹配单个非字母，并将单个非字母替换为一个星号 。上面 re.sub(r'[^a-z]+', '*', s) 这句话则表示匹配多个连续的非小写字母，并将多个连续的非小写字母替换为一个星号 。上面 re.sub(r'[^A-Z]+', '*', s) 这句话则表示匹配多个连续的非大写字母，并将多个连续的非大写字母替换为一个星号 。上面 re.sub(r'[^A-Za-z]+', '*', s) 这句话则表示匹配多个连续的非字母，并将多个连续的非字母替换为一个星号 。

（3）匹配非数字和非字母

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[^0-9A-Za-z]', '*', s)
	'************ImsogladtointroducemyselfandIm18yearsold***Todayis20200101ItisawonderfulDAYHHHHello111ComeHere22266AAzzhttpwelcomecn'
	>>> re.sub(r'[^0-9A-Za-z]+', '*', s)
	'ImsogladtointroducemyselfandIm18yearsoldTodayis20200101ItisawonderfulDAYHHHHello111ComeHere22266AAzzhttpwelcome*cn'

上面 re.sub(r'[^0-9A-Za-z]', '*', s) 这句话则表示匹配单个非数字和非字母，并将单个非数字和非字母替换为一个星号 。上面 re.sub(r'[^0-9A-Za-z]+', '*', s) 这句话则表示匹配多个连续的非数字和非字母，并将多个连续的非数字和非字母替换为一个星号 。

（4）匹配固定形式

a. 只保留字母和空格，将 repl 设置为空字符即可。

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[^a-z ]', '', s)
	' m so glad to introduce myself and m years old oday is t is a wonderful elloomeerezzhttpwelcomecn'
	>>> re.sub(r'[^a-z ]+', '', s)
	' m so glad to introduce myself and m years old oday is t is a wonderful elloomeerezzhttpwelcomecn'
	>>> re.sub(r'[^A-Za-z ]', '', s)
	'I m so glad to introduce myself and Im years old Today is It is a wonderful DAY HHHHelloComeHereAAzzhttpwelcomecn'
	>>> re.sub(r'[^A-Za-z ]+', '', s)
	'I m so glad to introduce myself and Im years old Today is It is a wonderful DAY HHHHelloComeHereAAzzhttpwelcomecn'

如果要使句子语义和结构更完整，则要先将其余字符替换为空格（即 repl 设置为空格），然后去除多余的空格，如下：

	>>> s1 = re.sub(r'[^A-Za-z ]+', ' ', s)
	>>> s1
	' I m so glad to introduce myself and I m years old Today is It is a wonderful DAY HHHHello ComeHere AA zz http welcome cn'
	>>> re.sub(r'[ ]+', ' ', s1)
	' I m so glad to introduce myself and I m years old Today is It is a wonderful DAY HHHHello ComeHere AA zz http welcome cn'

b. 去除以 @ 开头的英文单词

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'@[A-Za-z]+', '', s)
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! ,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"

c. 去除以？结尾的英文单词和数字（注意这是中文问号）

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[A-Za-z]+？', '', s)
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？——http://welcome.cn"
	>>> re.sub(r'[0-9A-Za-z]+？', '', s)
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...——http://welcome.cn"

d. 去除原始字符串中的 URL

	>>> s
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'http[:.]+\S+', '', s)
	"大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——"

python