James McKinney

Results 77 issues of James McKinney

Treat (and the tf-idf and similarity gems) all normalize tf to the number of terms in the document: https://github.com/louismullie/treat/blob/master/lib/treat/workers/extractors/tf_idf/native.rb#L78 We normalize so that (1) long and short documents have comparable...

To match Scrapy's code: https://github.com/scrapy/scrapy/blob/master/scrapy/utils/ossignal.py

For comparison, Elasticsearch supports both (e.g. in the instructions on this page: https://www.elastic.co/guide/en/elasticsearch/reference/7.10/configuring-tls.html). I use Apache's [mod_md](https://httpd.apache.org/docs/2.4/mod/mod_md.html) to automatically get and renew certificates from Let's Encrypt. Like certbot and other...

Like in https://github.com/sj26/lenientcsv HippieCSV just replaces an escaped quote character with a doubled quote character, before parsing. https://github.com/intercom/hippie_csv/blob/91f247ffaa45ffb15798bfd3637fea73434762e1/lib/hippie_csv/support.rb#L52

Ruby's CSV library has a single test for this: `test_leading_empty_fields_with_multibyte_col_sep_bug_fix` in `test_features.rb`. Ragel inlines the `when` block in an `if` statement; if the solution requires complex code, we may want...

:row_sep support is more complicated than `:quote_char` and `:col_sep` support, because it is frequently multi-byte (e.g. `\r\n`) and needs to support the `:auto` option. May be easier to implement once...

Options: - Use Ragel's error actions - Switch to row-by-row parser

Its documentation instead recommends using https://pre-commit.ci/ https://github.com/pre-commit/action

https://github.com/benoitc/gunicorn/issues/1493#issuecomment-321461614

On my machine, xsv takes about 5 seconds and csvstat takes about 10 seconds (it was dramatically improved in the latest release). Instead of updating every number in the readme...