Greg

Results 64 issues of Greg

Receiving following exception on some of the documents: ``` jobId='064c7a7b-5429-7123-8000-8d8911fb0a08', event='extract.completed', payload='{ \"status\": \"error\", \"error\": \"wand contains no images `MagickWand-99391' @ error/magick-image.c/MagickSetImageCompression/10254\", \"runtime_info\": { \"use_cuda\": true, \"model\": \"\", \"workspace\": \"/root/.cache/marie/TextExtractionExecutor/0\",...

NerExtractionExecutor needs to be refactored into a 'Processor' and 'Executor' Proposed : ``` TransformerNamedEntityProcessor NerExtractionExecutor ```

To improve PDF generation we need better line detection and refinement strategy. Current method work well, however, by utilizing deep learning we can improve the aggregation and detection. At this...

Currently Processors will log and possibly rethrow base `Exception` , this needs to be standardized across all processors and specific `Exception` needs to be thrown. Example : ``` try: failedOp()...

bug
help wanted

Application is no longer able to download models from public repository. ``` CRITI… extract_t/rep-2@52 can not load the executor from [06/09/23 23:05:04] {"jtype": "TextExtractionExecutor", "metas": {"py_modules": ["marie.executor.text"]}} ERROR extract_t/rep-2@52 during...

We should annotate generated asset documents to show that they have been generated by MARIE-AI This should include : PDF / TIFF /PNG

Update each processor to use proper workspace instead of TMP directory.

Implement OpenTelemetry meter that can provide instruments for collecting metrics from each processor.

There are assets that are being downloaded multiple times as per this log. This should be downloaded only once per run. `ro_sharding='none') roberta2 2022-04-26 13:18:13 | INFO | models.unilm.trocr.task |...

Implement Stable-Diffusion for document cleanup POC