Google doc dates returned as unicode (e.g., \ue907)
Example code:
import os
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
from google.auth import default
def get_document_dates(doc_id, creds_file=None):
scopes = ['https://www.googleapis.com/auth/documents.readonly']
if creds_file and os.path.exists(creds_file):
creds = Credentials.from_service_account_file(creds_file, scopes=scopes)
else:
creds, project = default(scopes=scopes)
# Build the Docs API service
service = build('docs', 'v1', credentials=creds)
# Get the document
document = service.documents().get(
documentId=doc_id,
fields='body'
).execute()
# Access the document's content
content = document.get('body').get('content')
# Process each element
for element in content:
if 'paragraph' in element:
paragraph = element.get('paragraph')
elements = paragraph.get('elements', [])
for elem in elements:
print(elem)
The first section of the doc:
I want to parse the date via the python API: Jan 13, 2025.
The first few elements printed:
{'startIndex': 1, 'endIndex': 5, 'textRun': {'content': '\ue907 | ', 'textStyle': {}}}
{'startIndex': 5, 'endIndex': 6, 'richLink': {'richLinkId': 'kix.p3Xj3hkh7bXl', 'textStyle': {}, 'richLinkProperties': {'title': 'Asana Board New NGS Submissions', 'uri': 'https://www.google.com/calendar/event?eid=XXX'}}}
{'startIndex': 6, 'endIndex': 7, 'textRun': {'content': '\n', 'textStyle': {}}}
{'startIndex': 7, 'endIndex': 18, 'textRun': {'content': 'Attendees: ', 'textStyle': {}}}
The date is returned in the first element as \ue907. How can that be converted to a date?
Note: there is a richLinkId in the second element, but that is for a separate calendar element, and not the Jan 13, 2025 date element.
More generally, why are date elements returned as unicode instead of something easier to work with?
I believe (and cannot find it documented anywhere) that Docs uses Private Use Area Unicode characters to represent special elements like chips and code blocks.
While this issue is about docs, It looks like, as of this writing, the feature is not available in sheets: https://stackoverflow.com/questions/79331123/how-to-extract-both-name-and-link-from-google-sheets-smart-chip-place-using-ap
Thanks for reporting this issue! This sounds very much like a an API endpoint issue rather than a client library issue; you would probably get the same response if you issued a curl command directly from the terminal.
I suggest following the suggestions in the Docs support page to see whether this issue has surfaced before, and to file an issue with the service team if needed.
Thanks!