sweet_xml icon indicating copy to clipboard operation
sweet_xml copied to clipboard

The optional modifier produces empty strings instead of nil

Open nathany-copia opened this issue 2 years ago • 6 comments

Strings don't behave as described by the README:

"If you use the optional modifier o together with a soft cast modifier (uppercase), then the value is set to nil"

~x"./NAME/text()"So produces an empty string instead of nil.

Someone attempted to fix this with #53, but it would've been better to open an issue to discuss possible solutions and backwards compatibility.

nathany-copia avatar Dec 13 '21 16:12 nathany-copia

Perhaps the most obvious (initial) fix would be to document the exception in the README (and possibly a workaround).

nathany-copia avatar Dec 13 '21 16:12 nathany-copia

The main problem (from memory), is that :xmerl_xpath.string/3 [1] either returns a list, or a scalar (without information on whether the xml path is unreachable) that is wrapped in a list by sweet_xml. The way it is handled afterward through Enum.join/1 is in my opinion problematic, but the damage is already done.

Overall, I think a formal description of how the modifiers interacts, and what they expect as input and should return as output is necessary before analyzing how to update the library.

I don't think I have enough time to do either in the coming weeks so any participation from anyone is welcomed.

Regarding the README update, I'm totally in favor of it.

[1] https://www.erlang.org/doc/man/xmerl_xpath.html#string-3

Shakadak avatar Dec 13 '21 16:12 Shakadak

Thanks for the explanation. So if xmerl doesn't provide information on whether the path is reachable for strings, then SweetXML cannot distinguish between <NAME></NAME> in the XML (as an empty string) or not <NAME> node at all (as nil), as might be done with the other types?

If that is the case, it sounds like So may not be a valid combination?

To be honest, I'm not sure how valuable that distinction of reachable or not is.

I think I'm more likely to want all empty strings to be converted to nil for my current situation, and others may prefer all strings (as is).

Is there anything less cumbersome than using transform_by on each-and-every field? e.g.

name: ~x"./NAME/text()"s |> transform_by(&Transform.to_string_or_nil/1),

nathany-copia avatar Dec 13 '21 16:12 nathany-copia

So if xmerl doesn't provide information on whether the path is reachable for strings

This part needs testing, as this is simply my inference from what the type of :xmerl_xpath.string/3 and the behavior of SweetXml around it implies, but this is not sufficient, and could be incorrect, maybe a node present with nothing inside would be represented as a list of one element, with an empty string. (the private function _value/1 [1] can imply that, but again, testing will clear things out)

If that is the case, it sounds like So may not be a valid combination? To be honest, I'm not sure how valuable that distinction of reachable or not is.

It's not that it is invalid, it is that the behavior is incompletely defined, if you take a look at the to_cast/3 implementation for soft_string [2], anything that can be converted using Kernel.to_string/1 will never be converted to nil if the optional modifier is also present. To me converting from charlist to binary is a form of casting, it stays semantically the same, the general usage is the same, but converting from an number to a binary is completely different, the operations allowed on them differs significantly.

So then what does the modifiers string and soft_string mean ? is it simply use to_string/1 and be done with it ? then nil will pretty much never happen. Is it convert from charlist to binary, and vice versa for the list modifier (sl what is even a stringlist ? a char list, a list of string ?) ? then the implementation is too permissive, badly documented, and incorrect.

That's why I wish to formalize the behavior of the modifiers.

Is there anything less cumbersome than using transform_by on each-and-every field?

A bit, you can define a function that navigate the supbspec given to xpath/3, for example:

  def inspect_spec(xs) do
    Enum.map(xs, fn
      {k, [x | xs]} -> {k, [x | Enum.map(xs, fn {k2, x} -> {k2, x |> SweetXml.transform_by(&IO.inspect(&1, label: "#{k}.#{k2}"))} end)]}
      {k, x} -> {k, x |> SweetXml.transform_by(&IO.inspect(&1, label: k))}
    end)
  end

You could imagine a function subspec_map/2 that takes a subspec like inspect_spec/1 above, a function that takes a key and a xpath struct as argument and returns a possibly new xpath struct, traversing the subspec recursively (unlike inspect_spec/1 that I used for quickly debugging something with only one level of nesting).

You would then only need to apply it like so:

subspec_map(subspec, fn _, x ->
  if x.cast_to == :soft_string and x.is_optional do
    SweetXml.transform_by(x, fn "" -> nil ; y -> y end)
  else
    x
  end
end)

Or other stuff of the sort.

[1] https://github.com/kbrw/sweet_xml/blob/master/lib/sweet_xml.ex#L763-L778 [2] https://github.com/kbrw/sweet_xml/blob/master/lib/sweet_xml.ex#L876-L882

Shakadak avatar Dec 14 '21 11:12 Shakadak

  @spec subspec_map(subspec, (%SweetXpath{} -> %SweetXpath{})) :: subspec
  when subspec: (%SweetXpath{}| keyword(%SweetXpath{} | subspec))
  def subspec_map(%SweetXpath{} = x, transform) do
    transform.(x)
  end
  def subspec_map({k, x}, transform) do
    {k, subspec_map(x, transform)}
  end
  def subspec_map(subspec, transform) when is_list(subspec) do
    Enum.map(subspec, fn x -> subspec_map(x, transform) end)
  end

Shakadak avatar Dec 31 '21 14:12 Shakadak

Hi, so it's confirmed that we won't do breaking changes, and we will go the path of documenting thoroughly the modifiers in order to leave no surprises to the users

Shakadak avatar Jan 21 '22 10:01 Shakadak