python-docx-template
python-docx-template copied to clipboard
Render silently fail when encountering "<na>" string[pyarrow] dtype
Describe the bug
I was dealing with truncated output when rendering large table, I found out it was due to missing values when using dtype string[pyarrow]
The problem doesn't happen when using other string representation.
As a workaround, I'm detecting missing values using pd.na
if pd.isna(record['key']):
record['key'] = 'placeholder'
To Reproduce
from docxtpl import DocxTemplate
import numpy as np
import pandas as pd
df = pd.DataFrame({
"missing": pd.Series(["1", np.nan, "3"], dtype="string[pyarrow]")
})
###
# {%p for item in contents %}
# {{ item['missing'] }}
# {%p endfor %}
###
template_path = "experimental_template.docx"
doc = DocxTemplate(template_path)
context = { 'contents': df.to_dict(orient='records')}
doc.render(context)
doc.save("experimental_result.docx")
Expected behavior
1
nan
2
Actual output
1
I think this is more a Jinja2 problem, you should open a bug to them.
More precisely, jinja2 will interpret your string[pyarrow] np.nan not like 'nan' but maybe something that is not XML compatible string (ie : <object NaN or something like that>
)
So the workaround would be to transform your strings[pyarrow] into simple string.
exactly. It outputs <NA>
as string representation of nan
Did you try to render with autoescape=True ?
Le sam. 23 avr. 2022 à 10:00, arkanoid87 @.***> a écrit :
exactly. It outputs <NA> as string representation of nan
— Reply to this email directly, view it on GitHub https://github.com/elapouya/python-docx-template/issues/431#issuecomment-1107421430, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGY33BFTGEECYDF4UIXWEDVGOUZDANCNFSM5UA6HZ2A . You are receiving this because you commented.Message ID: @.***>
I find the same problem in my project. And then the author way can solve this problem. the way is render with autoescape=True. Thank a lot