WireViz
WireViz copied to clipboard
[feature] Optionally avoid loosing HTML tags in YAML input values when generating .bom.tsv file
WireViz allows to include HTML tags in YAML data, which is very useful for specifying links to the manufacturer/supplier product pages like so:
manufacturer: '<a href="https://www.molex.com"> Molex </a>'
mpn: '<a href="https://www.molex.com/molex/products/part-detail/crimp_housings/0022013087"> 22-01-3087 </a>'
supplier: '<a href="https://www.digikey.com"> DigiKey.com </a>'
spn: '<a href="https://www.digikey.com/en/products/detail/molex/0022013087/26443"> WM2006-ND </a>'
This in turn gets included in output HTML file where the BOM table is rendered with the links, which is great.
However, when BOM is emitted into .bom.tsv file, all HTML tags are getting sanitized out, thus losing all link information. Sanitizing TSV output is presumptuous - I would rather output everything "as is", letting the end user decide what he/she wants to include in YAML file input. Maybe a reasonable compromise would be to add an option switch to turn this behavior on and off. Even better, an option field in YAML metadata section to control this behavior on the project-specific level, so that the output format does not depend on command line options.
@nyq wrote:
However, when BOM is emitted into .bom.tsv file, all HTML tags are getting sanitized out, thus losing all link information. Sanitizing TSV output is presumptuous - I would rather output everything "as is", letting the end user decide what he/she wants to include in YAML file input.
I understand your need and expectation, but I will be surprised if the majority of users needing TSV output share such an expectation as the default functionality. I don't expect most programs importing a bill of materials TSV file to support HTML tags inside the text entries. You might be able to find a few programs that actually are able to handle HTML tags properly, but to enhance the probability of success for our users in most cases, the safest thing is to clean out all HTML tags in the TSV output.
Maybe a reasonable compromise would be to add an option switch to turn this behavior on and off. Even better, an option field in YAML
metadatasection to control this behavior on the project-specific level, so that the output format does not depend on command line options.
Making the current functionality default and creating your expectation as an optional feature, should be possible. I argue for creating entries in the YAML options section instead of the metadata section for controlling optional features. However, there might be other, similar cases where optional filtering of text input for different outputs might make sense, and we should gather some different views about this before deciding how to specify such options.
Examples of text input elements that might need optional filtering for different outputs:
- HTML link tags
- HTML tags in general
- Carriage return and/or Linefeed characters
- Leading/trailing space characters and maybe other white space characters
- The raised "2" for square, "3" for cube, etc.
- Other special characters or sequences
Thank you for your response and your work on this project. I agree that options section is more appropriate than metadata
I'm just a user and contributer, like you. I try to argue for functionality that I think will gain this project, but the project owner has the final word about accepting a suggestion or not. However, we can all suggest features, argue for or against, and even implement pull requests.
Optional features like your suggestion are important for driving this project forward. I wanted to put your suggested feature in a context with potentially similar features to help us finding good ways to specify such optional features. My reason for mentioning the list of input element categories, is that such entries are already filtered differently for the different output files, but currently the rules for such filtering are fixed. Maybe some of these rules also could benefit from optional alternatives?
E.g., linefeeds in the input are currently converted to <br>-tags for HTML outputs and cleared otherwise. That makes it currently impossible to insert intentional linefeeds between HTML tags to create readable HTML. An optional alternative filtering rule for this might help.
A different issue is if such optional alternative filtering rules always must be global options, or if sometimes different parts of the diagram might benifit from different filtering rules.
Maybe e.g. options.output.tsv.remove_links = False could be used to specify the requested feature? This way, the options.output and options.output.tsv will only have one subentry each, but as I mentioned in an earlier message above, there might be other filtering features for the different outputs that also could benefit from optional features in the future, and this is a preparation to avoid re-designing the options namespace too often.
If such a feature in some cases might be useful for only a few connectors/cables, then it could optionally also be allowed to specify the same option structure inside each connector/cable entry? What do you think? Other ideas are also welcome.
Good idea!
See also my https://github.com/formatc1702/WireViz/issues/230#issuecomment-1025160495 that I wrote more than a year ago about an alternative suggestion to specify alternative filtering rules. In that case it was only two alternative rules for each input, but in general, it might be more than two.