Arthit Suriyawongkul

Results 362 comments of Arthit Suriyawongkul
trafficstars

The `ontology/` directory is deleted by PR #963. We can close this issue, or we can leave this open to make this serve as a reminder for adding the directory...

Will Arm Neon helps? ``` Arm Neon technology is an advanced Single Instruction Multiple Data (SIMD) architecture extension for the Arm Cortex-A and Cortex-R series processors. ``` https://developer.arm.com/architectures/instruction-sets/simd-isas/neon

What is general recommendation for numbers (0-9) btw? I see languages like en, de allow them, but language like ka doesn't.

Reviewed 184 samples from the current extracted sentences, got "OK" for 88%. The rest of the errors are mostly due to a "dangling word" - words that meant to be...

Continue from discussion in https://github.com/common-voice/cv-sentence-extractor/issues/139#issuecomment-821964021 , I'm thinking of one possible way to extract Thai sentences and guarantee the 3 sentences limit. A sentence splitter may work with JSON files...

Thank you @MichaelKohler . The new option segmenter is a welcome. I think this will make the pipeline more standardized, even with different language-specific processors. Will take a look more...

I was initially thought that crfcut may work for this, but after several tries and inspections into the split text - some of the output starts or ends with an...

> In the https://science-on-schema.org guidelines for Dataset metadata, we recommend using SPDX URIs from the RDF files: https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#license > > In CodeMeta, which is a schema.org extension for software metadata,...

This one is a step by step guide to create SPDX 3.0 document manually: https://spdx.github.io/spdx-spec/v3.0/annexes/getting-started/

It turns out that this PR is a superset of PR #75