test icon indicating copy to clipboard operation
test copied to clipboard

Missing testdata files for unittests

Open Shreeshrii opened this issue 6 years ago • 9 comments

testdata/lstm_training.txt is required for building training data for lstm_test

https://github.com/tesseract-ocr/tesseract/blob/master/unittest/lstm_test.cc#L6

// Generating the training data: // If the format of the lstmf (ImageData) file changes, the training data will // have to be regenerated as follows:

// ./tesseract/text2image --xsize=800 --font=Arial
// --text=tesseract/testdata/lstm_training.txt --leading=32
// --outputbase=tesseract/testdata/lstm_training.arial // ./tesseract tesseract/testdata/lstm_training.arial.tif
// tesseract/testdata/lstm_training.arial lstm.train
// --pageseg_mode=6

Shreeshrii avatar Jan 21 '19 21:01 Shreeshrii

0146_281.3B.tif line6.tiff 5318c4b679264.jpg

Shreeshrii avatar Jan 27 '19 16:01 Shreeshrii

Cc'ing @jbreiden.

stweil avatar Jan 28 '19 21:01 stweil

@stweil Do we still need more images/testdata from Google?

Shreeshrii avatar Jul 10 '19 14:07 Shreeshrii

I'm afraid that we have to find our own solutions without waiting for Google. They cannot provide all images and test data because some might be copyrighted. Therefore it is important to find free replacement images and data. We have nearly all images needed for the unit tests (equationdetect_test still needs an image).

stweil avatar Jul 12 '19 16:07 stweil

If you are looking for solutions to find free replacement images and data for use in unit testing, there are several options you can consider:

Free Image Banks: There are several free image banks available on the internet, where you can find high-quality, public domain images to use in your tests. Some examples include Unsplash, Pixabay and Pexels.

Test Data Databases: In addition to images, you may need test data for your test units. There are databases of test data freely available on the web that can be used to create realistic test scenarios. Search for open datasets related to your application domain.

Creating Images and Test Data: If you are unable to find suitable images or test data, consider creating your own. You can create simple images using free image editing tools like GIMP or Paint.NET, and generate test data using random data generation libraries in Python like Faker.

Community Resources: Don't underestimate the power of community. Search forums, discussion groups, and online communities related to your application domain. Many times, other developers are willing to share images and test data that they have created or found.

Creative Commons Licenses: When searching for free replacement images and data, be sure to check usage licenses. Many free resources are available under Creative Commons licenses, which may have specific attribution requirements or commercial use restrictions.

AndersonMartins1 avatar Mar 10 '24 10:03 AndersonMartins1

Thanks, but this issue is not about finding any image. It is about finding very specific images for a very specific task which is part of the unittests.

stweil avatar Mar 10 '24 12:03 stweil

To resolve this issue, you can follow these steps:

Clearly identify which specific images are required for the test cases in question.

Make sure these images are available somewhere accessible for testing. This could be in an internal image repository, a cloud storage server, or another accessible location.

If images are not available, you may need to create or purchase the necessary images and ensure they are stored in a suitable location.

After ensuring that the required images are available, you can update your unit tests to reference these specific images when running your tests.

Be sure to clearly document the image requirements for each test case so future developers know which images are needed and where to find them.

Rerun your unit tests to ensure that the images are being used correctly and that the tests are passing as expected.

By following these steps, you should be able to solve the problem of finding the specific images needed for the test cases in your unit tests.

AndersonMartins1 avatar Mar 10 '24 12:03 AndersonMartins1

I am sorry to say that, but your comments (and your pull requests) are not helpful. They sound like the result of an AI chat bot. If you want to help, you should read this issue carefully (it lists the missing images), look into the test code where these images are used and try to activate that code with replacement images.

stweil avatar Mar 10 '24 16:03 stweil

Ok

AndersonMartins1 avatar Mar 10 '24 19:03 AndersonMartins1