potools
potools copied to clipboard
Line number problems in C `# notranslate` exclusions
The following message seems to be missing from data.table.pot:
https://github.com/Rdatatable/data.table/blob/b7f2106efe038d93577f427f34c06d9c00b4c486/src/fread.c#L2775
The code seems to consider this message to be subject to a # notranslate exclusion:
https://github.com/MichaelChirico/potools/blob/0dc529285c4f54a86d0755317d9304d735c3858f/R/get_src_messages.R#L255
debug: src_messages = drop_excluded(src_messages, exclusions[is_outside_char_array(exclusion_pos,
arrays)])
Browse[1]> src_messages[grepl('sep=', msgid)] # <-- row 3 here
msgid msgid_plural fname
<char> <list> <char>
1: sep='\\\\n' passed in meaning read lines as single character column\\n [NULL] DTPRINT
2: sep=',' so dec set to '.'\\n [NULL] DTPRINT
3: %8.3fs (%3.0f%%) sep= [NULL] DTPRINT
call array_start is_marked_for_translation line_number
<char> <int> <lgcl> <int>
1: DTPRINT(_(" sep='\\\\n' passed in meaning read lines as single character column\\n")) 71163 TRUE 1674
2: DTPRINT(_(" sep=',' so dec set to '.'\\n")) 83411 TRUE 1892
3: DTPRINT(_("%8.3fs (%3.0f%%) sep="), tLayout-tMap, 100.0*(tLayout-tMap)/tTot) 129888 TRUE 2775
Browse[1]> n
<...>
Browse[1]> src_messages[grepl('sep=', msgid)] # <-- one row less now!
msgid msgid_plural fname
<char> <list> <char>
1: sep='\\\\n' passed in meaning read lines as single character column\\n [NULL] DTPRINT
2: sep=',' so dec set to '.'\\n [NULL] DTPRINT
call array_start is_marked_for_translation line_number
<char> <int> <lgcl> <int>
1: DTPRINT(_(" sep='\\\\n' passed in meaning read lines as single character column\\n")) 71163 TRUE 1674
2: DTPRINT(_(" sep=',' so dec set to '.'\\n")) 83411 TRUE 1892
Browse[1]> exclusions[is_outside_char_array(exclusion_pos, arrays)]
file line1 capture_lengths
<char> <int> <int>
1: src/fread.c 438 0
2: src/fread.c 1366 0
3: src/fread.c 1733 0
4: src/fread.c 1783 0
5: src/fread.c 2111 0
6: src/fread.c 2119 0
7: src/fread.c 2305 0
8: src/fread.c 2775 0 # <-- why is line 2775 excluded?
9: src/fread.c 2794 0
Browse[1]> readChar(file, file.size(file)) |> substr(exclusion_pos[8]-32, exclusion_pos[8]+16)
[1] "\n DTPRINT(\" =====\\n\"); // # notranslate\n " # <-- exclusion no.8 corresponds to a different line!
Since the exclusions are matched against the original, non-preprocessed file contents:
https://github.com/MichaelChirico/potools/blob/0dc529285c4f54a86d0755317d9304d735c3858f/R/get_src_messages.R#L75
...and the newlines are matched in the preprocessed file contents, where they have different offsets due to the comments being removed:
https://github.com/MichaelChirico/potools/blob/0dc529285c4f54a86d0755317d9304d735c3858f/R/get_src_messages.R#L77-L82
...the line numbers produced from exclusion_pos and newlines_loc end up being incorrect:
https://github.com/MichaelChirico/potools/blob/0dc529285c4f54a86d0755317d9304d735c3858f/R/get_src_messages.R#L250-L254
Matching exclusions against the original file would have given the correct line number:
Browse[1]> newlines_loc2 = c(0L, as.integer(gregexpr("\n", readChar(file, file.size(file)), fixed = TRUE)[[1L]]))
Browse[1]> data.table(
file = file,
line1 = findInterval(as.integer(exclusion_pos), newlines_loc2),
capture_lengths = attr(exclusion_pos, "capture.length")[ , 1L]
)[8]
file line1 capture_lengths
<char> <int> <int>
1: src/fread.c 2113 0
Browse[1]> readLines(file)[2113]
[1] " DTPRINT(\" =====\\n\"); // # notranslate"
Browse[1]>
...but there must be a better solution, one that is compatible with preprocessing.