Madelon Hulsebos comments

Results 22 comments of


                                            Madelon Hulsebos

List of semantic data types detected by Sherlock

That looks useful, thanks for making and sharing! The 78 semantic types that Sherlock is trained on can be found in table 19 on page 28 in [this paper](https://adalabucsd.github.io/papers/TR_2021_SortingHat.pdf). It...

List of semantic data types detected by Sherlock

You're welcome! PS, all types match a type in wikidata.

Tensorflow not included in requirements

Thanks for sharing this issue! I hope you managed to make it work in the meantime, but I will look into this later. If you have a general solution, a...

Tensorflow not included in requirements

Thanks @michaelmior, I manage the packages as in the `requirements.txt` with conda but did not test the 3.8 requirements. Will work on this, thanks!

Nan probabilities prediction on datasets with (almost) constant data

Dear @stranger-codebits, Thanks a lot for reporting your issue and findings, this is a great catch. I hope to have time to look into this soon. In the meantime, feel...

Nan probabilities prediction on datasets with (almost) constant data

Hi Nikolaos, That would be much appreciated! There are no guidelines in place, but it would be great if you could provide your solution along with some evidence showing that...

Generation of paragraph vector files

Hi! These can be generated with the notebook here: https://github.com/mitmedialab/sherlock-project/blob/master/notebooks/01-data-preprocessing.ipynb. Please let me know if this is clear and works for you! Madelon

Generation of paragraph vector files

Hi Varnit, These files can be obtained by extracting paragraph vectors again with the code in this module: https://github.com/mitmedialab/sherlock-project/blob/master/sherlock/features/paragraph_vectors.py. The process is displayed here: https://github.com/mitmedialab/sherlock-project/blob/master/notebooks/03-retrain-paragraph-vector-features.ipynb. Does that address your question?...

Generation of paragraph vector files

Hi Varnit, These additional files are generated by gensim automatically through the `.save()` method, if the model is rather large. These files are then also expected to exist upon loading...

Generation of paragraph vector files

Hi Giacomo, Thank you! - I recommend building a new paragraph vector model, but you can try if the existing works for your dataset. - Indeed, this should be done...