James McKinney

http://www.jamespetermckinney.com/

Open Contracting Partnership Canada

Results 77 issues of


James McKinney

tf*idf: Why don't you normalize to maximum term count for document?

5

comment

Treat (and the tf-idf and similarity gems) all normalize tf to the number of terms in the document: https://github.com/louismullie/treat/blob/master/lib/treat/workers/extractors/tf_idf/native.rb#L78 We normalize so that (1) long and short documents have comparable...

Add SIGINT and SIGBREAK (Windows) to SIGTERM_PATTERN

1

comment

To match Scrapy's code: https://github.com/scrapy/scrapy/blob/master/scrapy/utils/ossignal.py

Support for certificates in PEM format, in addition to keystore format

2

comment

For comparison, Elasticsearch supports both (e.g. in the instructions on this page: https://www.elastic.co/guide/en/elasticsearch/reference/7.10/configuring-tls.html). I use Apache's [mod_md](https://httpd.apache.org/docs/2.4/mod/mod_md.html) to automatically get and renew certificates from Let's Encrypt. Like certbot and other...

Add support for escaped quote characters

Like in https://github.com/sj26/lenientcsv HippieCSV just replaces an escaped quote character with a doubled quote character, before parsing. https://github.com/intercom/hippie_csv/blob/91f247ffaa45ffb15798bfd3637fea73434762e1/lib/hippie_csv/support.rb#L52

Multi-byte :col_sep support

Ruby's CSV library has a single test for this: `test_leading_empty_fields_with_multibyte_col_sep_bug_fix` in `test_features.rb`. Ragel inlines the `when` block in an `if` statement; if the solution requires complex code, we may want...

:row_sep support

:row_sep support is more complicated than `:quote_char` and `:col_sep` support, because it is frequently multi-byte (e.g. `\r\n`) and needs to support the `:auto` option. May be easier to implement once...

Recover from errors and continue parsing

Options: - Use Ragel's error actions - Switch to row-by-row parser

pre-commit/action is deprecated

1

comment

Its documentation instead recommends using https://pre-commit.ci/ https://github.com/pre-commit/action

docs: gthread is a sync worker

3

comment

https://github.com/benoitc/gunicorn/issues/1493#issuecomment-321461614

docs: CSV Kit comparison is out-of-date

On my machine, xsv takes about 5 seconds and csvstat takes about 10 seconds (it was dramatically improved in the latest release). Instead of updating every number in the readme...