Shotaro Ishihara issues

Results 102 issues of


                                            Shotaro Ishihara

Adversarial Validation Approach to Concept Drift Problem in Automated Machine Learning Systems

# どんなもの？ Uberから出た論文。trainとtestの性質が異なってしまう機械学習のよくある課題に対して、Kaggleで頻出なAdversarial Validationを使って対応したという話。 ## 論文リンク https://arxiv.org/abs/2004.03045

Talk to Papers: Bringing Neural Question Answering to Academic Search

# どんなもの？近年発展する質問応答技術を活用して、研究者が自然言語で学術論文を検索できるシステムを設計。学部生のときとか、適切な検索クエリが分からない場合があるので、質問文で検索できるのは良さそう。 https://arxiv.org/abs/2004.02002

Hooks in the Headline: Learning to Generate Headlines with Controlled Styles

# どんなもの？記事の見出し生成タスクにおいて、単なる要約ではなく、ユーモア・ロマンチック・クリック誘導の3観点で「色を付ける」取り組み。Newyork TimesやCNNの記事データを用いて実験している。 ![Screen Shot 2020-04-08 at 12 48 29](https://user-images.githubusercontent.com/31459778/78742875-1145d300-7998-11ea-9ca9-8ff213289e6f.png) https://arxiv.org/abs/2004.01980

Correlated daily time series and forecasting in the M4 competition

# どんなもの？時系列のM4コンペ参加者の論文。「We identify data leakage as one reason for its success」とあり、今後のコンペ設計に関する提唱もしている。 https://arxiv.org/abs/2003.12796

Soccer Team Vectors

# どんなもの？サッカークラブの埋め込み表現「STEVE」を提案。「同じクラブに勝つ2つのクラブは似ている」という仮定の下で計算。機械学習アルゴリズムの入力として活用できる。サッカークラブの市場価値を推定するタスクで、優れた性能を示した。 #spoana https://arxiv.org/abs/1908.00698

Detecting and Characterizing Bots that Commit Code

# どんなもの？ commit情報（author name, commit message, file changedなど）を基に、botか否かを判定。AUCで0.9出ている。botを除くことで、OSSプロジェクトの生産性などを正しく見積もりたいという課題意識から。 https://arxiv.org/abs/2003.03172

Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection

# どんなもの？ Kaggle「Jigsaw Unintended Bias in Toxicity Classification」コンペのデータを使った分析 https://arxiv.org/abs/1909.09758

MetNet: A Neural Weather Model for Precipitation Forecasting

# どんなもの？将来8時間までの降水量を予測するニューラルネットワーク「MetNet」。入力はレーダー・衛星データで、降水量の確率マップを作成。 https://arxiv.org/abs/2003.12140

Integrating Crowdsourcing and Active Learning for Classification of Work-Life Events from Tweets

# どんなもの？ NLPの手動アノテーションの負担を軽減し、信頼性を維持するためのactive learningを活用したクラウドソーシング戦略について。Amazon Mechanical Turkを利用。 https://arxiv.org/abs/2003.12139

Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data

# どんなもの？プロパガンダか否かの判定にBERTを利用する際、不均衡性に対応すべく損失を重み付けする方法について。このデータに対して単純なover samplingでは性能改善しないことも確認している。 https://arxiv.org/abs/2003.11563