Kerem Turgutlu

Results 10 issues of Kerem Turgutlu

_Be sure you've searched [the forums](https://forums.fast.ai) for the error message you received. Also, unless you're an experienced fastai developer, first ask on the forums to see if someone else has...

Hello, I believe the corpus and the `word_freqs` output used in the [BPE](https://github.com/huggingface/course/blob/main/chapters/en/chapter6/5.mdx#implementing-bpe) / [WordPiece](https://github.com/huggingface/course/blob/main/chapters/en/chapter6/6.mdx#implementing-wordpiece) implementations have a mismatch simply `Course -> course` is not capitalized in corpus but `word_freqs`...

## Description - Modified activation calculation to speed things up. For example, calling `numpy()` and `detach()` in a for loop over and over again is very slow and bad practice....

How can we download CXR and mask data for research purposes? Command line gives error and web UI says "you don't have permission to download": ``` darwin dataset pull v7-labs/covid-19-chest-x-ray-dataset:all-images...

The following doesn't timeout nor return anything. ``` url = "http://http-live.sr.se/srextra01-mp3-192" article = newspaper.Article(url, request_timeout=5) article.download() ``` Same with: ``` from newspaper.network import get_html_2XX_only article.config.__dict__ {'MIN_WORD_COUNT': 300, 'MIN_SENT_COUNT': 7, 'MAX_TITLE':...

I am trying to understand the role that active intents play in the dataset and replicate Active Intent Accuracy (MultiWoz 2.1) 0.924 [from](https://github.com/google-research/google-research/tree/master/schema_guided_dst#results) . But I've noticed a lot of...

Data preparation involves downloading reddit comment and submission data form https://files.pushshift.io/reddit/ and it is written that total data is around 700GB. However, the actual size of the data is around...

I am interested in coding a little demo with the pretrained multiwoz model. However I am not able to figure out how to inject book info into db pointer dynamically....

Thanks a lot for putting this repo together and providing the fresh CC dumps at HF. I was looking for a way to find dataset splits for other languages but...

I am writing a data pipeline to process common crawl and referencing your code here in [pile-cc repo](https://github.com/EleutherAI/pile-cc). In this repo PILE-CC raw version accounts for 200 GB however in...