Sefaria-Project
Sefaria-Project copied to clipboard
Siddur_Ashkenaz,_Weekday,_Shacharit,_Concluding_Prayers,_(Israel:)_Korbanot ref is not understood by API
https://www.sefaria.org/api/texts/Siddur_Ashkenaz,_Weekday,_Shacharit,Concluding_Prayers,(Israel:)_Korbanot returns an error:
{
"error": "Partial reference match for 'Siddur Ashkenaz, Weekday, Shacharit, Concluding Prayers, (Israel.) Korbanot' - failed to find continuation for 'Siddur Ashkenaz, Weekday, Shacharit, Concluding Prayers'.\nValid continuations are:\nAshrei,\nLamenatze'ach,\nPsalm 20,\nUva Letzion,\nKaddish Shalem,\nAlenu,\nKaddish Yatom,\nShir shel Yom,\nBarchi Nafshi,\nL'David,\n(Israel:) Korbanot"
}
But the continuation suggested is the one I'm using (and got directly from /api/index/Siddur_Ashkenaz). My guess is that an exception is being thrown somewhere within full_regex which is getting confused with the parentheses, and that they're being interpreted as part of a regex and not as literal characters. Or, seemingly the colon is being translated into a period for some reason, and that's failing the match.
After digging into this, it seems that the tndict does contain the 'Siddur Ashkenaz, Weekday, Shacharit, Concluding Prayers, (Israel:) Korbanot' as it should.
However, as you suspected, the : is being converted into a . by self.tref = self.tref.replace(":", ".") in __clean_tref (model/text.py:2443)
I can think of two possible solutions:
A. Save all nodes with : as . in the title node dicts upon initialization, so when we come to search for Siddur Ashkenaz, Weekday, Shacharit, Concluding Prayers, (Israel.) Korbanot we will find it. Considering that currently nodes with : will never be found, this can only be considered an improvement - I think this is the right solution.
B. Limit the :-to-. conversion in some way
I'd like someone better versed in the code to say their thoughts before I attempt to make any changes since I'm new here :)
What about making colon a banned character in a title ref? Then there's never any ambiguity
Also a possibility, but once that requires more constant supervision looking forward. That is, what's to stop someone 5 years from now adding a title with a colon and causing the same problem?
This too is something that requires input from the repo owners, since I don't know what the process of adding new titles is :)
My intuition is that replacing : with . is only an issue within the numerical part of the Ref. E.g in the Ref Genesis 1:1, the colon being replaced comes after a 1. So we can likely solve this by making the replacement logic a bit smarter.
I don't particularly like the idea of banning the colon character. Banning characters makes sense in certain situations, but ultimately colons are used in titles. Often. So that's probably not the right way to go.
What about modern books with chapters that have titles?
Regarding supervision - you could add logic to the code that saves models and validates that there are no incorrect characters. That part shouldn't be difficult.
Good hunting!
We have banned : in titles, but this one snuck in.
We have validation code, but I see that the primary Index title bans these, while the titles for the deeper parts of the book (SchemaNodes) only validate that the text is ascii. That allowed this to sneak in.
In the short term, I think we're just going to adjust this title.
But the conversation does have me thinking. It would be nice if we could allow : in titles. There are a number of cases where the title comes out looking pretty poor because we've substituted a ; for the :.
My sense is that we can probably make this change in the server / website code reasonably easily. @nsantacruz what's your sense of the apps? Would they be able to digest : in title without too much trouble?
@EliezerIsrael a quick look at mobile code shows that we heavily rely on splitting on : in refs. Since the apps don't know how to distinguish between Schema Nodes and numeric sections, it would take a little thought on how to refactor it to support this. Naively, I think we could just assume any colon before the final space in the ref is not part of the sections. I think that may work, although that would fail if we ever deal with refs that end in a schemaNode and don't have sections.
There certainly are Refs in the wild that end in a schema node, and don't have sections - Pesach Haggadah, Karpas, and the like. @blockspeiser this is the same problem we have with frontend code, yeah? Where we're forced to carry the whole list of books to deal with a few edge cases where names could be confused with sections.
Remind me - did we ever hit on a way forward with those issues?
@EliezerIsrael I think this related to problem I was looking at but not exactly the same. The problem I was looking at had to do with schema node titles that end in numerals, so that without a list of schema node titles there's no way to know if Sefer Ploni , Almoni 1 refers to the first section of a node called Almoni or to the entirety of a node called Almoni 1. I still don't have any solution for this, other than the fact that the frontend load could be lightened a bit by only including a list of schema nodes that end in numerals, since those are the only ones that are ambiguous.
For this problem I think what @nsantacruz is saying is right -- so long as we never end up with a schema node title that includes a : after the last space. We don't have any now, so it's safe to assume a : after the last space is part of the section string, but if we did allow : in schema node titles we'd have to make sure that e.g. Almoni 1:2 couldn't be a title itself, or this assumption would break.
From a product point of view I am all for allowing : in titles though! That ; in "Moses; A Human Life" is a real bummer.