lib-cl-sii-python icon indicating copy to clipboard operation
lib-cl-sii-python copied to clipboard

data.ref: versioning XML schemas of "factura electrónica"

Open yaselc opened this issue 3 years ago • 9 comments

  • The contents of the schemas-xml directory are separated into subdirectories, according to the sources used to build the set. The subdirectories are named considering the source used and the most recent modification timestamp present between the files in the set.
  • New enum cl_sii.base.constants.XmlSchemasVersionEnum to define the available XML schema versions
  • Create a new version of the XML schemas for "factura electrónica" from the official XML schemas of AEC (Archivo Electrónico de Cesión). Last update timestamp is 2019-12-12.

Source: cl-sii-extraoficial/archivos-oficiales@c89dec5

Changelog:

  • SiiTypes_v10.xsd:
    • Replaces CRLF line endings with LF.
    • root: A new simple type Dec14_4-0Type is added for non-negative decimals (admits 0)
    • TipoTransCOMPRA: The base type is changed and adds a restriction for the minimum and the maximum value (1 - 7)
    • TipoTransVENTA: Adds restriction for the minimum and maximum value (1 - 4)
  • DTE_v10.xsd
    • Replaces CRLF line endings with LF.
    • IdDoc: Adds the element TipoFactEsp
    • Receptor.Extranjero: Adds the element TipoDocID
    • IndServicio: Adds a new item to the enumeration
    • MntExeOtrMnda: Type changed to Dec14_4-0Type
    • MntTotOtrMnda: Type changed to Dec14_4-0Type

yaselc avatar Apr 12 '21 23:04 yaselc

@glarrain @jtrh I run the schema validation on a sample of 73.839 DTEs, from this sample the validation failed for a total of 5.039 DTEs, the same number of DTEs regardless of the version of the XML schemas used, which means that the new version of the XML schemas doesn't introduce new errors. The criterion for the selection of the DTEs in the sample was that these have been used in a "cesión" made by the FP platform at some point and this "cesión" has been approved by the SII.

yaselc avatar Apr 12 '21 23:04 yaselc

@glarrain @jtrh, please suggest what you think would be the most suitable name for the XML schema versions. To be authentic, I think we should remove the XML schemas taken from unofficial sources.

yaselc avatar Apr 12 '21 23:04 yaselc

@glarrain @jtrh, please suggest what you think would be the most suitable name for the XML schema versions.

  • Do all the files of a year-month-day (YMD) version (e.g. V2019_12_12) originate from the same source package (e.g. the same SII ZIP file)?
  • Could a version contain files with different timestamps?
  • Does it make sense to associate all the files of a specific version with a single date?

To be authentic, I think we should remove the XML schemas taken from unofficial sources.

For better or worse, we have already used those unofficial schemas, so it may be a good idea to keep them. Maybe we could add a suffix to unofficial versions (e.g. V2019_12_31_LibreDTE)?

jtrh avatar Apr 15 '21 05:04 jtrh

@glarrain @jtrh, please suggest what you think would be the most suitable name for the XML schema versions.

Another idea: Instead of version 2019_12_12, use something like sii_rtc_2019_12_12_schema_cesion or sii_rtc_2019_12_12 to make it easier to associate the version with its source in src/code/rtc/2019-12-12-schema_cesion.

jtrh avatar Apr 15 '21 05:04 jtrh

Codecov Report

Merging #211 (60ee41e) into develop (dcec499) will decrease coverage by 0.04%. The diff coverage is 81.08%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #211      +/-   ##
===========================================
- Coverage    81.02%   80.98%   -0.05%     
===========================================
  Files           32       32              
  Lines         2525     2556      +31     
  Branches       375      378       +3     
===========================================
+ Hits          2046     2070      +24     
- Misses         306      310       +4     
- Partials       173      176       +3     
Impacted Files Coverage Δ
cl_sii/libs/xml_utils.py 78.35% <75.00%> (-0.60%) :arrow_down:
cl_sii/dte/parse.py 81.75% <76.92%> (-0.79%) :arrow_down:
cl_sii/rtc/parse_aec.py 89.08% <76.92%> (-0.66%) :arrow_down:
cl_sii/base/constants.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dcec499...60ee41e. Read the comment docs.

codecov-io avatar Apr 15 '21 05:04 codecov-io

Codecov Report

Merging #211 (0464024) into develop (dcec499) will decrease coverage by 0.02%. The diff coverage is 82.05%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #211      +/-   ##
===========================================
- Coverage    81.02%   81.00%   -0.03%     
===========================================
  Files           32       32              
  Lines         2525     2558      +33     
  Branches       375      378       +3     
===========================================
+ Hits          2046     2072      +26     
- Misses         306      310       +4     
- Partials       173      176       +3     
Impacted Files Coverage Δ
cl_sii/libs/xml_utils.py 78.35% <75.00%> (-0.60%) :arrow_down:
cl_sii/dte/parse.py 81.75% <76.92%> (-0.79%) :arrow_down:
cl_sii/rtc/parse_aec.py 89.08% <76.92%> (-0.66%) :arrow_down:
cl_sii/base/constants.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dcec499...0464024. Read the comment docs.

codecov-commenter avatar Apr 20 '21 02:04 codecov-commenter

  • Do all the files of a year-month-day (YMD) version (e.g. V2019_12_12) originate from the same source package (e.g. the same SII ZIP file)?

Not necessarily, for example, the first official version was built from various sets (ZIP files) available on the official site: http://www.sii.cl/factura_electronica/schema_dte.zip http://www.sii.cl/factura_electronica/schema_iecv.zip http://www.sii.cl/factura_electronica/schema_cesion.zip

  • Could a version contain files with different timestamps?

yes, this is definitely always the case, but the last modification timestamp is an indicator of the update of the whole set.

  • Does it make sense to associate all the files of a specific version with a single date?

If we agree that the latest modification timestamp is an indicator of the freshness of the whole set, then it would make sense.

For better or worse, we have already used those unofficial schemas, so it may be a good idea to keep them. Maybe we could add a suffix to unofficial versions (e.g. V2019_12_31_LibreDTE)?

Excellent, I think it's a very good idea

Another idea: Instead of version 2019_12_12, use something like sii_rtc_2019_12_12_schema_cesion or sii_rtc_2019_12_12 to make it easier to associate the version with its source in src/code/rtc/2019-12-12-schema_cesion.

I think using the date as a prefix might be a good way to help to sort the sets. I applied the suffix idea when possible because at least in the first version it is impossible because of the variety of sources. Finally, I added more description to each element in the enum, to help make it self-contained

yaselc avatar Apr 20 '21 03:04 yaselc

@glarrain @jtrh

yaselc avatar Apr 26 '21 14:04 yaselc

CC: @jtrobles-cdd

jtrh avatar May 05 '21 23:05 jtrh