Timetable export with Weasyprint
Context
Currently we use a sub-optimal library for exporting PDFs for the timetable. @tomasr8 has a PoC of the transition to a weasyprint-based approach.
Task
- [ ] Check integrity of all features (Everything still there)
- [ ] Re-evaluate which features are necessary (Ask for usage stats)
- [ ] Test refactored features
- [ ] Extend if necessary
Branches
Original branch: https://github.com/tomasr8/indico/tree/pdf-timetable
Original issue here: https://github.com/indico/indico/issues/6135
And here are some other issues which I found that are related to the timetable (not saying we should implement them, but since we're changing the library anyway it might be worth to have a look):
- https://github.com/indico/indico/issues/4083
- https://github.com/indico/indico/issues/1285 (This one's quite old, not sure if it's still relevant)
Checklist for functionalities and if they work properly.
ADVANCED
- [x] Include cover page
- [x] Include table of contents
- [x] Show list of sessions in the table of contents
- [x] :warning: Print the ID of each contribution
- [ ] Print abstract content of all contributions
- [x] Include length of the contributions
- [x] Print each session on a separate page
- [x] :warning: Use session color codes (Seems to be default..?)
- [x] Include session description
- [x] :warning: Print the start date close to session title
- [x] Include top-level contributions
- [x] Include top-level breaks
- [x] :warning: Show speaker title
- [x] Show speaker affiliation
Printing the contribution does not work as intended (No ID, dummy data for contributor, etc)
Stats for features. I agree with @tomasr8 that we should maybe get rid of the 'startpage' feature.
Update 04/10/2024
The design is fully updated for the cover page, ToC and session blocks. We also removed the following features due to their low popularity:
- :x: Showing contribution ID (The ID adds no context)
- :x: Changing the start page number (almost always 1)
- :x: Session color codes (The usage of these colors has changed and we always want the session colors)
The following needs to still be done:
- [ ] :tada: Poster features
- [ ] :tada: Poster abstract features
- [ ] :wrench: Fixing the page count in the ToC
- [ ] :wrench: Getting the top-level contribution block colors (Using dummy color now)
- [ ] :test_tube: Test print to see if colors are problematic
- [ ] :soap: General code clean-up (including todo comments etc. and making some things more DRY)
Additional notes
- Discussed with @tomasr8 . For the poster sessions it will likely simply be a case of removing the time span for each nested contribution (as they're parallel) and keeping the blocks underneath each other. Likely removing the line below each nested contrib as that might be confusing. For Mondays' Ajobs info; long story short, a poster session is when you change the type of a session to poster and it indicates that all nested contributions run in parallel.
:warning: As discussed with @ThiefMaster , I am quite stuck for multiple days on a timeline CSS issue. :warning:
The timeline on the left of the session blocks displays unexpected behavior with long session blocks. Essentially, even though the session block breaks over multiple PDF pages (the full div), the timeline itself gets cut off on its page and does not continue onto the next page.
Even stranger is that on the next page, the div behaves as if the timeline is not there at all (takes full width of page). Will provide screenshot.
Status
| Date | State |
|---|---|
| 11/10/2024 | Lots of trial and error again, many lines written and deleted. Currently making a minimized mock-up to reproduce the problem and share it in the weasyprint and styling community. Probably stack overflow. Meanwhile looking for other tasks to take up, so that I can come back to this one 'fresh' and hopefully with answers. |
Below you will find an exact (minimal) PDF example of the issue I described in my previous comment.
Broken PDF: broken_example.pdf
Modified dummy code can be found on my branch: broken-pdf-timetable.
⚠️ As discussed with @ThiefMaster , I am quite stuck for multiple days on a timeline CSS issue. ⚠️
The timeline on the left of the session blocks displays unexpected behavior with long session blocks. Essentially, even though the session block breaks over multiple PDF pages (the full div), the timeline itself gets cut off on its page and does not continue onto the next page.
Even stranger is that on the next page, the div behaves as if the timeline is not there at all (takes full width of page). Will provide screenshot.
Status Date State 11/10/2024 Lots of trial and error again, many lines written and deleted. Currently making a minimized mock-up to reproduce the problem and share it in the weasyprint and styling community. Probably stack overflow. Meanwhile looking for other tasks to take up, so that I can come back to this one 'fresh' and hopefully with answers.
Confirmed with @tomasr8 that this is likely a bug. Asked question in Weasyprint issue: https://github.com/Kozea/WeasyPrint/issues/2274
Due to poor efficiency for large events, it might be interesting to look into a C-based library at some point. This is however not in the scope for now and needs to be researched.
I don't think this is feasible, unless there is a library like weasyprint (same functionality) that's written in C (or Rust) AND has the same quality of Python bindings. Also, doesn't Weasyprint already use C-based libraries under the hood for various things?
Anyway, considering that this is not something which is done very frequently, I'd rather do what we talked about last week, ie offloading the PDF generation to a Celery task and polling for it to finish when the event is above a certain size.
I don't think this is feasible, unless there is a library like weasyprint (same functionality) that's written in C (or Rust) AND has the same quality of Python bindings. Also, doesn't Weasyprint already use C-based libraries under the hood for various things?
Anyway, considering that this is not something which is done very frequently, I'd rather do what we talked about last week, ie offloading the PDF generation to a Celery task and polling for it to finish when the event is above a certain size.
That's fine. I did not look under the hood of weasyprint yet, but we can stick to the celery-based approach.
Hi!
Sorry to get into your discussion. I’m Guillaume from CourtBouillon, WeasyPrint maintainers, and I may have some answers. :smile:
I don't think this is feasible, unless there is a library like weasyprint (same functionality) that's written in C (or Rust) AND has the same quality of Python bindings.
There are other open source solutions, based on browsers’ engines, such as PagedJS or Vivliostyle, but they’re JS-based. Other well-known solutions are proprietary.
Also, doesn't Weasyprint already use C-based libraries under the hood for various things?
WeasyPrint uses Pango and some of its dependencies to handle text. Most of the rest if pure Python or in the standard Python library.
WeasyPrint is known to be "slow", but there are solutions to make it faster.
Don’t hesitate to get in touch if you have more questions about this topic, templating for print, or anything else!
Hey @liZe !
Thanks for commenting and providing some tips. For now we will stick with WeasyPrint and I'll look into the link on common use cases that you provided.
Cheers!