learn-NLP-luhuibo Python正则表达式时出现TypeError: expected string or bytes-like object

Python正则表达式时出现TypeError: expected string or bytes-like object

Open Valuebai opened this issue 5 years ago • 0 comments

用BeautifulSoup解析网页数据，用正则表达式处理数据时时出现如下错误：

python错误提示：TypeError: expected string or bytes-like object（预定的数据类型或者字节对象相关）

一般为数据类型不匹配造成的。

Python3中有六个标准的数据类型：

Number(数字)
string(字符串)
List（列表）
Tuple（元组）
Sets（集合）
Dictionary（字典）

可以通过print(type(object))来查当前的数据类型，式中object为要查询的对象。

首先有一段这样的代码：

import re
import requests
from bs4 import BeautifulSoup
import lxml

#获取网页数据
urlSave = "https://www.douban.com/people/yekingyan/statuses"
req = requests.get(urlSave)
soup = BeautifulSoup(req.text,'lxml')

#beautifulsoup解析后，获取所需的数据
times = soup.select('div.actions > span')
says = soup.select('div.status-saying > blockquote')

然后查看一下获得是数据数型是什么

print('says:',type(says)) 结果是：says: <class 'list'>

这就可以知道BeautifulSoup里的soup.select()选出来的数据是list列表类型。

下面分别取出列表内的数据

#遍历输出

for say in says:
    print(type(say))

看一下是什么类型

结果是：<class 'bs4.element.Tag'> ，不同于上述的六种类型

原来Beautiful Soup 将复杂HTML文档转换成一个复杂的树形结构,每个节点都是 Python 对象,所有对象可以归纳为4种:

Tag
NavigableString
BeautifulSoup
Comment

直接对数据用正则表达式

for say in says:
    # 正则表达式获取必要数据
    say = re.search('<p>(.*?)</p>',say)

出现错误TypeError: expected string or bytes-like object 因此在正则表达式之前，转换一下数据类型，就解决了问题。如下：

for say in says:
    #转换数据类型，不然会报错
    say = str(say)
    # 正则表达式获取必要数据
    say = re.search('<p>(.*?)</p>',say)

【原文】https://blog.csdn.net/weixin_42105977/article/details/80390957

Dec 25 '19 11:12 Valuebai

learn-NLP-luhuibo learn-NLP-luhuibo copied to clipboard

Python正则表达式时出现TypeError: expected string or bytes-like object

learn-NLP-luhuibo
learn-NLP-luhuibo copied to clipboard