spark-nlp MultiDateMatcher only returning 1 element

MultiDateMatcher only returning 1 element

Open TommyDong1998 opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

[X] I have searched the existing issues and did not find a match.

Who can help?

No response

What are you working on?

Finding dates in a string.

import sparknlp from sparknlp.base import * from sparknlp.annotator import * from pyspark.ml import Pipeline documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document") date = MultiDateMatcher()
.setInputCols("document")
.setOutputCol("date")
.setAnchorDateYear(2020)
.setAnchorDateMonth(1)
.setAnchorDateDay(11)
.setOutputFormat("yyyy/MM/dd") pipeline = Pipeline().setStages([ documentAssembler, date ]) data = spark.createDataFrame([["Nov 29 2023, Dec 1 2024"]])
.toDF("text") result = pipeline.fit(data).transform(data) result.selectExpr("explode(date) as dates").show(truncate=False)

Current Behavior

Currently when I pass in the following to MultiDateMatcher ["Nov 29 2023, Dec 1 2024"] It only returns 11/29/23 instead of both dates.

+-----------------------------------------------+ |dates | +-----------------------------------------------+ |{date, 10, 20, 2023/11/29, {sentence -> 0}, []}| +-----------------------------------------------+

Expected Behavior

Get both dates

Steps To Reproduce

https://colab.research.google.com/drive/1xGE1MqqcsjOL9kyOoOwkiqnMa4LabETK?usp=sharing

I just copied and paste the example code off doc and add the dates(Nov 29 2023, Dec 1 2024) in.

Spark NLP version and Apache Spark

5.1.4 3.5.0

Type of Spark Application

Python Application

Java Version

openjdk version "11.0.21" 2023-10-17 OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu122.04) OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

Java Home Directory

N/A

Setup and installation

Google collab

Operating System and Version

Google Collab(ubuntu linux)

Link to your project (if available)

https://colab.research.google.com/drive/1xGE1MqqcsjOL9kyOoOwkiqnMa4LabETK?usp=sharing

Additional Information

https://sparknlp.org/api/com/johnsnowlabs/nlp/annotators/MultiDateMatcher$.html

Dec 08 '23 08:12 TommyDong1998

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 5 days

Jun 17 '24 00:06 github-actions[bot]

spark-nlp spark-nlp copied to clipboard

MultiDateMatcher only returning 1 element

Is there an existing issue for this?

Who can help?

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Spark NLP version and Apache Spark

Type of Spark Application

Java Version

Java Home Directory

Setup and installation

Operating System and Version

Link to your project (if available)

Additional Information

spark-nlp
spark-nlp copied to clipboard