api-examples icon indicating copy to clipboard operation
api-examples copied to clipboard

Immersive Translate Feedback Zone

Open TaoLoading opened this issue 1 year ago • 12 comments
trafficstars

Dear Mathpix Support Team,

I hope this message is helpful to you. I am a member of the Immersive Translate team, and we have been utilizing Mathpix for our translation projects with great enthusiasm. We deeply appreciate the innovative solutions your product offers, which have significantly enhanced our workflow. At the same time, we also encountered some problems when using Mathpix, I will explain them separately in this issue, hoping to get your help, thanks!

TaoLoading avatar Jun 27 '24 03:06 TaoLoading

Description

There is a problem in recognizing vertically arranged Japanese documents. Here are the details: Text Alignment: Japanese text is arranged vertically from top to bottom and right to left. Issue Observed: The recognition results show missing characters, incorrect characters, and some characters that are not recognized at all.

Attachments

https://drive.google.com/file/d/1Z1LcEuuuqGOyTgyjckdvzN_DSd0JFsmo/view?usp=sharing

TaoLoading avatar Jun 27 '24 03:06 TaoLoading

Description

There are also problems with the following academic papers: Authors Section: The recognition of author names is often mixed up or incorrect. Abstract Section: The recognition of the abstract text is not very accurate, with some parts missing or incorrect.

Attachments

https://drive.google.com/file/d/1UdWnnq7lWf1nfOzxnzNaYTOTBI5c94pH/view

TaoLoading avatar Jun 27 '24 03:06 TaoLoading

Hi @TaoLoading. Thank you for your feedback. Please send me your email at [email protected]. We want to create a dedicated Slack channel with you for more efficient communication.

The text-to-page ratio of the PDF with vertical Japanese text is roughly 20-30% text and 70-80% white space. For better OCR accuracy, it's important to have text cover most of the page, ideally around 80%, similar to a standard PDF page. But our team will do additional tests on the recognition of vertical Japanese text.

I requested the access to the 2nd PDF file.

ykolodnitskiy avatar Jun 27 '24 06:06 ykolodnitskiy

Description

This is a scanned version of a Urdu language pdf file, and it seems that the text has not been effectively recognized.

Attachments

https://drive.google.com/file/d/1U4dt3zDexSdL0FQlZaiNLjegx6XjLj83/view?usp=sharing

TaoLoading avatar Jul 08 '24 03:07 TaoLoading

Description

The table part of this PDF file will have missing content after being recognized.

Attachments

https://drive.google.com/file/d/1SYbNIc4IeoYD-b7PyCJGHCmurDmj_b-W/view?usp=sharing

TaoLoading avatar Jul 15 '24 02:07 TaoLoading

Description

This is a screenshot of a PDF, there is a recognition issue with the vertically arranged text

Attachments

https://drive.google.com/file/d/1w8_-SZx6GI7nSoaDIqKcbv-R3pFwhofp/view?usp=sharing

TaoLoading avatar Jul 24 '24 13:07 TaoLoading

Description

The content in the box is incorrectly identified in this PDF

Attachments

https://drive.google.com/file/d/1iS1J7J_k8fl8mVRgFIcbe_yZuqppil7F/view?usp=sharing

TaoLoading avatar Aug 08 '24 02:08 TaoLoading

Description

There are some problems in recognizing this pdf:

  1. In the original text, the characters "/ ************" are recognized partly as images and partly as text. Additionally, two sentences are also identified as images.
  2. Some content is returned in markdown source code format.

Attachments

https://drive.google.com/file/d/1rudypXm1geAwRcW59X-3v1syrLOCMbL4/view?usp=sharing

TaoLoading avatar Aug 19 '24 07:08 TaoLoading

Description

The table in the PDF appears to have been identified as an image,

Attachments

https://drive.google.com/file/d/1ImglTsKfnQKngGnLOlt2dOxyAGAGnl92/view?usp=sharing

TaoLoading avatar Oct 16 '24 02:10 TaoLoading

Description

The following is a scanned PDF, and the content is relatively complex, so the recognition effect for this document is not very good in some areas

Attachments

https://drive.google.com/file/d/1sgGIYZCO32lubplUBtLCZYx68WXITDoB/view?usp=sharing

TaoLoading avatar Oct 25 '24 15:10 TaoLoading

Description

In the original PDF, there is a sentence that has been split into two parts due to formatting issues, for example, the ending "are usual-" of the first part and the beginning "ly defined..." of the second part are divided into two paragraphs. After recognition, the HTML also separates this sentence. I wonder if it is possible to concatenate such cases after recognition to ensure the integrity of a sentence.

Attachments

https://drive.google.com/file/d/1Y99SplfLbB0vNW3Km14DXY7J2A1-qoVl/view?usp=sharing

TaoLoading avatar Jan 06 '25 06:01 TaoLoading

Hi @TaoLoading, thank you for reaching out about this. We are looking into the feasibility for implementing this and will let you know.

joyce-mathpix avatar Jan 07 '25 22:01 joyce-mathpix