dpv
dpv copied to clipboard
[FIX]: Remove unused CSV files from vocab_csv/
In code/vocab_csv, there are a number of CSV files that have no mentioned in the RDF/HTML generation code.
- Some of them are probably part of development in progress and the inclusion of them will soon be happen
- Some of them are part of outdated workflow/concepts that have moved to other CSV files
We like to keep (1) and may like to remove (2) to tidy up the codebase and avoid possible confusion.
- If needed, files in (2) can still be accessible from the tag releases (e.g. dpv-2.1)
Files need to be reviewed
Listed below are CSV files that have no mentioned in the RDF/HTML generation code, with notes:
| Filename | Notes |
|---|---|
| Bias.csv | Last commit 6 months ago |
| DE_glossary.csv | Last commit last year; Has only a header; Maybe used for future translations |
| EntityControl.csv | Last commit 10 months ago |
| Mapping_ODRL.csv | Last commit last month |
| Requirement.csv | Last commit last year |
| RiskSource.csv | Last commit last month; Has comment "Proposed for v2.2" |
| Standards_ISO.csv | Last commit last year |
| UseCase.csv | Last commit 9 months ago |
| concepts.csv | Last commit 9 months ago |
| legal-memberships.csv | Empty; Replaced by location_memberships.csv? |
| legal-uk.csv | Replaced by legal-gb.csv? |
| legal_Authorities.csv | Replaced by legal-(countrycode).csv? |
| legal_EU_Adequacy.csv | Replaced by legal-eu.csv? |
| legal_EU_EEA.csv | Replaced by location_memberships.csv? |
| legal_Laws.csv | Replaced by legal-(countrycode).csv? |
| legal_Locations.csv | Replaced by location.csv? |
| legal_properties.csv | Replaced by location_properties? |
| tech-data.csv | Last commit last year |
| tech-ops.csv | Last commit last year |
| tech-provision-properties.csv | Last commit last year |
| tech-security.csv | Last commit 11 months ago |
| tech-surveillance.csv | Last commit last year |
legal*files are likely to be replaced by other files and could be removed- The rest could still be in use
A script the find CSV files with no mentioned in the codebase:
#!/bin/bash
# Run this script from the "code" directory.
CSV_DIR="vocab_csv"
csv_files=$(find "$CSV_DIR" -type f -name "*.csv")
unused_files=()
for csv_file in $csv_files; do
file_name=$(basename "$csv_file")
if ! grep --include="*.py" --include="*.sh" -r "$file_name" . > /dev/null; then
unused_files+=("$file_name")
fi
done
if [ ${#unused_files[@]} -eq 0 ]; then
echo "All .csv files in '$CSV_DIR' are mentioned in the codebase."
else
sorted_unused_files=($(printf "%s\n" "${unused_files[@]}" | sort))
echo "The following .csv files in '$CSV_DIR' are not mentioned in the codebase:"
for file in "${sorted_unused_files[@]}"; do
echo "$file"
done
echo
echo "From the list above, these files are empty or have only one line:"
for file in "${sorted_unused_files[@]}"; do
file_path="$CSV_DIR/$file"
if [ ! -s "$file_path" ] || [ $(wc -l < "$file_path") -le 1 ]; then
echo "$file"
fi
done
fi
Hi @bact thanks - some of these files are present because they are part of DPV 1.0 or 2.0, therefore we usually keep them around in case fixes are needed or we want to see source of changed extensions. Some others are proposed work items, so they won't be included in the RDF/HTML generation scripts. Below I've made a note for how to resolve each file, but there is no issue with keeping them in the folder as they are helpful to look stuff up now and then. In the future, once we have resolved the proposed items, deleting all files in vocab_csv and downloading+extracting all CSVs again should fix this.
- Bias.csv -- can be deleted
- DE_glossary.csv -- needed for multilingual translations
- EntityControl.csv -- delete
- Mapping_ODRL.csv -- needed for ODRL-DPV mappings
- Requirement.csv -- source for requirements
- RiskSource.csv -- from RISK extension
- Standards_ISO.csv -- proposed modelling of ISO standards
- UseCase.csv -- source for use-cases
- concepts.csv -- can be deleted
- legal-memberships.csv -- can be deleted
- legal-uk.csv -- can be deleted
- legal_Authorities.csv -- can be deleted
- legal_EU_Adequacy.csv -- can be deleted
- legal_EU_EEA.csv -- can be deleted
- legal_Laws.csv -- can be deleted
- legal_Locations.csv -- can be deleted
- legal_properties.csv -- can be deleted
- tech-data.csv -- can be deleted
- tech-ops.csv -- can be deleted
- tech-provision-properties.csv -- can be deleted
- tech-security.csv -- can be deleted
- tech-surveillance.csv -- can be deleted