Johann Petrak

Results 188 comments of Johann Petrak

This seems to happen because the Pipfile defines `url = "https://pypi.org/simple"` It appears that this project is broken and abandoned as it did not see any updates for 3 years.

Actually, the pypi page / README file DOES say "pip3 install py-template-project". See https://github.com/AlexIoannides/py-package-template/blob/11f37953683e5f063e89ca2c567304428e9cb563/README.md

To at least partly answer my own question here: it looks like this could trivially achieved by running several `from_disk` calls from the different stored vocabs on a new vocab...

Hmm it would be good to find out for sure. (and maybe document in more detail what `from_disk` is supposed to do when called with several different files on the...

Could somebody with better understanding of the inner workings of spacy please give feedback if the method outlined in the previous comment is a proper method to merge vocabs or...

I cannot imagine that I am was the only one running processing pipelines in parallel these days, so I think a dedicated API method for merging vocabs properly would be...

I have a problem which I think is very similar: I would like to "stream" data to a HF Array (memory-mapped) Dataset, where the final size of the dataset is...

This works for me if I then (actually I also close the writer: `writer.close()`) open the Arrow file as a dataset using `ds=Dataset.from_file(final_data_path)` then `ds.save_to_disk(somedir)`. The Dataset created that way...

I was thinking that `save_to_disk` is necessary when one wants to re-use that dataset as a proper HF dataset later, no? At least what I wanted to achieve is create...

What is even more irritating is that the label from the previous example is sometimes shown for the next example, or that no label from the previous example is shown...