JsoupXpath icon indicating copy to clipboard operation
JsoupXpath copied to clipboard

获取文本与期望不一致的问题

Open RainGinx opened this issue 2 years ago • 1 comments

为了便于快速识别您的问题,请认真回答以下问题,谢谢! Please answer these questions before submitting your issue. Thanks!

  1. 使用的表达式与使用场景,确保能够复现(What did you do , If possible, provide a recipe for reproducing the error.)?

  2. 期望看到什么(What did you expect to see)?

  3. JsoupXpath给出的结果是什么(What did you see instead)?

  4. 当前使用的版本(What version of JsoupXpath are you using)?

RainGinx avatar Jul 29 '22 08:07 RainGinx

处理的html内容

<div id="p-author" >
[德] 
<a>格林兄弟</a>
,[丹] 
<a >安徒生</a>
,
<a>叶圣陶</a> 
著                
</div>

1.1使用的xpath为

//div[@id='p-author']/descendant-or-self::text()

1.2使用场景是期望获取div标签及其子标签中的文本内容

2 期望内容

[
    "[德] ",
    "格林兄弟",
    ",[丹] ",
    "安徒生",
    ",",
    "叶圣陶",
    "著"
]

3 JsoupXpath给出的结果是什么

[
    "叶圣陶",
    "[德]",
    ",[丹]",
    ",",
    "著",
    "安徒生",
    "格林兄弟"
]

4 当前使用的版本

JsoupXpath 2.5.1

JDK 11

测试代码

@Test
void test(){
    String html = "<div id=\"p-author\" >\n" +
                "[德] \n" +
                "<a>格林兄弟</a>\n" +
                ",[丹] \n" +
                "<a >安徒生</a>\n" +
                ",\n" +
                "<a>叶圣陶</a> \n" +
                "著                \n" +
                "</div>";
    JXDocument jxDocument = JXDocument.create(html);
    List<JXNode> nodes = jxDocument.selN("//div[@id='p-author']/descendant-or-self::text()");
    for (JXNode node : nodes) {
        System.out.println(node.asString());
    }
}

RainGinx avatar Jul 29 '22 09:07 RainGinx