cyk issues

Results 10 issues of

cyk

KeyError for producing HTML output with `--html`

Hi, by running scripts with `--html` option, I met `KeyError` when trying to transform [XML dump](https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2) to HTML, same as [#40](https://github.com/attardi/wikiextractor/issues/40). This is also reported in [#247](https://github.com/attardi/wikiextractor/issues/247). Any solution to...

Distributed training for streaming dataset

### Feature request Any documentations for the the `load_dataset(streaming=True)` for (multi-node multi-GPU) DDP training? ### Motivation Given a bunch of data files, it is expected to split them onto different...

Distributed data parallel training for streaming datasets

enhancement

Multiple dataloader memory error

For the use of multiple datasets and tasks, we use around more than 200+ dataloaders, then pass it into `dataloader1, dataloader2, ..., dataloader200=accelerate.prepare(dataloader1, dataloader2, ..., dataloader200)` It causes the memory...

bug

Set `add_prefix_space = False` for existing pre-trained tokenizers

I would like to add special tokens into an existing (pre-trained) tokenizer, in which the added tokens are not whitespace-separated between tokens. Therefore, the decoded string contains additional whitespace ahead...

Bugs in running RankGAN

When running `python main.py -g rankgan`, got IndexError: ```bash Traceback (most recent call last): File "/home/xxx/Texygen/main.py", line 78, in parse_cmd gan.train_oracle() File "/home/xxx/Texygen/models/rankgan/Rankgan.py", line 121, in train_oracle self.evaluate() File "/home/chaiyekun/GAN.tf/Texygen/models/rankgan/Rankgan.py",...

Question of Generator Loss of MaliGAN

Is the generator loss of MaliGAN correct? It should be: ![image](https://user-images.githubusercontent.com/13767887/91018741-8867c500-e622-11ea-8988-c6bd96660982.png) [https://github.com/geek-ai/Texygen/blob/3104e22ac75f3cc2070da2bf5e2da6d2bef149ad/models/maligan_basic/MaliganGenerator.py#L112](https://github.com/geek-ai/Texygen/blob/3104e22ac75f3cc2070da2bf5e2da6d2bef149ad/models/maligan_basic/MaliganGenerator.py#L112) ```python self.g_loss = -tf.reduce_sum( tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_vocabulary, 1.0, 0.0) * tf.log( tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_vocabulary]), 1e-20, 1.0) ),...

Cyk1337 patch 1

### PR types ### PR changes ### Description

contributor

stale

Questions about the prior issues

Hi there, I am confused about the part of applying prior to the computed variances. Would you by any chance explain it? Thanks ;) [Link](https://github.com/aakhundov/tf-example-models/blob/40b32991a76cb8d7201f9a5851789847db310b79/models/tf_gmm.py#L100) ```python # applying prior to...

:pencil:+solutions to exercises 1-12

Supplement solutions to exercise 1-12 compatible with the new version of Ray library