tika-similarity icon indicating copy to clipboard operation
tika-similarity copied to clipboard

Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.

Results 11 tika-similarity issues
Sort by recently updated
recently updated
newest added

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5. Release notes Sourced from urllib3's releases. 1.26.5 :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap Fixed...

dependencies

Allow .json metadata file input for edit-value-similarity, jaccard-similarity, cosine-similarity with arg: --jsonDir /path/to/json/dir . And add a circle-packing-for-all.py to visualize the new --jsonDir outputs. run python circle-packing-for-all.py --inputCSV /path/csv --cluster...

Undefined variables file1_only_features, file2_only_features, record_only_features. Change 'wb' to 'w'

Took the computeScores2 method from cosine_similarity.py (which takes in an input file of JSON objects) and extended those to the jaccard_similarity.py and edit-value-similarity.py. The edit-value one slightly different syntax than...

This is a pull request for the extra credit in the assignment 1 in DSCI550 SP24. We are team9.

Open index.html to a browser to see the visualizations of Big Foot.

The Jaccard, Edit Distance and Cosine distance similarity scripts do not consider restarting the Tika server on failure. This is evident when the Tika server processes too many files (100K)...