tika-similarity
tika-similarity copied to clipboard
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5. Release notes Sourced from urllib3's releases. 1.26.5 :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap Fixed...
Allow .json metadata file input for edit-value-similarity, jaccard-similarity, cosine-similarity with arg: --jsonDir /path/to/json/dir . And add a circle-packing-for-all.py to visualize the new --jsonDir outputs. run python circle-packing-for-all.py --inputCSV /path/csv --cluster...
Undefined variables file1_only_features, file2_only_features, record_only_features. Change 'wb' to 'w'
Took the computeScores2 method from cosine_similarity.py (which takes in an input file of JSON objects) and extended those to the jaccard_similarity.py and edit-value-similarity.py. The edit-value one slightly different syntax than...
This is a pull request for the extra credit in the assignment 1 in DSCI550 SP24. We are team9.
Open index.html to a browser to see the visualizations of Big Foot.
Fix issue #106
The Jaccard, Edit Distance and Cosine distance similarity scripts do not consider restarting the Tika server on failure. This is evident when the Tika server processes too many files (100K)...