link-grammar
link-grammar copied to clipboard
Move website to github
See for example #303. Many other things in it are also out of date.
Will it be a good idea to maintain the link-grammar library documentation here, along with the library sources, so we can update it as needed?
Currently, the website is operated from an SVN repo under the abiword organization -- I have write access, and so do the the various abiword admins.
We could create a "linkgrammar-website" git repo, but I'm not sure how that would be synced with svn.
I have thought about getting a link-grammar.org domain for the website, but have never gotten around to it. Sysadmining websites is not hard, but it is time-consuming and tedious.
We can host the documentation site on GitHub itself, using a custom domain for access. Just make a push and the site is updated.
Same deal with the current svn access -- svn commit and the site is updated. So its zero extra work, for me, at least, since I've already got svn access :-)
To add to the issue of out-of-date documentation, is there a working API for parsing the constituents (noun phrase/verb phrase etc) in a sentence? The online demo is able show the constituent tree. I find it extremely helpful to be able to parse sentences in traditional grammar.
Several remarks:
- Yes, there is an API for obtaining the constituents, I believe the docs for that are still correct and up-to-date. The API itself is extremely simple, it just returns the S-expressions.
- The online demo uses an extremely out-of-date version of the parser.
- Although the constituent parse is easy to understand, it looses a large amount of syntactic information about a parse. Its like looking at the syntax through a foggy window; you can get a general impression, but miss all the details.
Do you mean the API example here? I don't seem to find the constituent.h header in the installation directories (I assume the CNode type and linkage_constituent_tree() function are in that header).
Ah, yes I just updated that page. The structures being illustrated there were removed. They were very tiny pieces of bullshit code that should never have even been in the API. You can recreate equivalent function, and even superior function, in about 10 or 20 lines of code, simply by working with the s-expr.
Thanks, I wonder why I haven't noticed the linkage_print_constituent_tree() api before. I was pretty sure I have looked through the link-includes.h header!
I took the liberty to make a proposition of design for the new website. Tell me what you think.
It's a simple guile script that does a markdown → html transformation recursively on the file system.
Wow! Well, I've day-dreamed for a while about getting a domain-name for link-grammar, but this is .. another way of moving in that direction. Some comments: -- its called Link Grammar not Grammar Link -- the "by" line should be removed, or should be expanded to include all authors. Temperly, Sleator came up with the core idea, but the thing today is quite sharply different from the original.
I'm unclear about the mechanics: the website also hosts some large binary blobs, e.g. the tar-files for arabic and persian, and other miscellany, I forget what. Is there enough storage space available to host these blobs, as well?
I'm also nervous about hosting everything on github. This elevates github into a single point of failure: if it does down, goes bankrupt, is purchased by Microsoft, etc. then there is no backup or alternative -- this is one reason why I've been sticking with the abiword site.
Wow!
I am happy you like it :artificial_satellite:!
its called Link Grammar not Grammar Link
Sorry :cactus: .
the "by" line should be removed, or should be expanded to include all authors.
I will remove it for the time being.
Is there enough storage space available to host these blobs, as well?
This can be an issue. I know no project that host their tarball in github pages.
I'm also nervous about hosting everything on github.
It can be "hidden" by a domain or maybe with something like a transparent redirect...
Anyway, I will continue to tidy things up.
I just looked -- currently there are 300 MBytes of tarballs -- each new release takes about 3.7MBytes and I expect future datasets in the 20MBytes to 100Mybes range, so getting a hosting provider with maybe a GB of capacity should be plenty.
My nervousness is not about hiding the domain, it is about "putting all ones eggs in one basket": what would happen if github went out of business, or was hacked, or was purchased by microsoft -- it would be better to have a hosting provider other than github, someone else that can provide the needed space.
I mean -- I already host several domains in some servers about one meter over to my left, and I could host link-grammar there as well -- but this is my backup plan -- if abiword goes down, I can rehost here. I guess, if github went down, I could rehost here, too ... Hmm. Requires more work on my part ...
probably better to just modernize the README to incorporate more of the website info.
I need to start making plans on migrating the link-grammar website off of abiword.com ... The long-term stability of abiword.com is not certain. Unfortunately, the long-term stability of any site I might host is also not certain. What is the best way to host the site and guarantee long-term stability?
What is the best way to host the site and guarantee long-term stability?
Maybe this: https://pages.github.com/
Heh. Thanks. But ...
-
I don't trust github sufficiently. I'm thinking that maybe FSF or the gnome foundation or some other open source foundation would be a better choice. It requires some research.
-
The website includes several gigs of old tarballs and other ancillary stuff that needs to be served.
Here's the meta-problem: the abiword site is hosted by University of Twente -- utwente.nl -- they've hosted many (hundreds? thousands?) open source websites, and site mirrors, for decades. For example, they used to provide a european copy of gnucash.org, because they had an rsync
to the website. Worked great for a decade, and then something broke, and they didn't fix it and I forgot who to contact to ask them to fix it. So, in that sense, they are a "stable provider". However, the main contact point in the abiword organization, who works with, set up the abiword server, has disappeared. At least one other person in the abiword organization does have ssh access, so some kinds of repairs can be made.... however, if that person disappears, loses interest, .... if the server dies for some reason ... not clear what the backup plan is.
Now, I could host a link-grammar website, right here, on my server. But if I die in a car crash on a snowy winter's day ... keeping that server up would fall to the hands of my kids, one of whom is computer-literate, but has not been groomed to take over this job. There would be fumbling.
I don't trust github because I don't trust Microsoft. They could decide that they are bored with hosting websites, and just turn everything off. Google does this all the time: they set things up, then they kill them. Once burnt, twice shy. Twice burnt, infinitely shy.
Thus, a non-profit foundation seems the most promising, and FSF/gnome/OSI-type places have sufficient broad-based support to continue forwards indefinitely.
Regarding FSF, they provide a free repository+bug trackling+website space: https://savannah.nongnu.org/register/requirements.php
They already provide resources for a few projects: https://savannah.nongnu.org/search/index.phptype_of_search=soft&words=%%% Most of them seem not to have a homepage, but some have, e.g.: https://www.gnu.org/software/gift/ All of them seem to use an FSF git repository. Hopefully using GitHub instead is allowed.
look into https://fosshost.org/about
What about hosting the source files of the website on GitHub, and on an FSF/FOSSHOST website just use "git pull" periodically?
I'll start moving the html pages to github when I get a "rainy day" with nothing else to do. However, the website also has hundreds of large tarballs, and they'd exceed the allowed maximum for github.
Maybe we can overcome or bypass the problem of the tarballs provided by GitHub. What was the exact problem with them?
The website provides tarballs of things that aren't just link-grammar -- for example, some persian and arabic tools, some of the tools I used for creating the Russian dicts, mirrors of other assorted/related junk... and early versions of link-grammar that are not in git. (versions 1.0 2.0, 3.0 and 4.0 -- git starts with version 4.1b)
The problem with the "tarballs provided by Github" is that they don't include configure
and they don't include the Makefile.in
's That's the biggest problem. There are annoyances, some of which are that early LG tarballs might not have matching git-tags, or that there are git tags for things that should never be tarballs (experimental branches, etc.)
The website provides mirrors of dozens(?) of papers/pdfs about LG. Some of these are no longer available from the original websites.
The problem with the "tarballs provided by Github" is that they don't include
configure
and they don't include theMakefile.in
's
This Stackoverflow answer describes in detail how you can upload arbitrary tar files to GitHub's Release page.
The author of this answer recommends uploading the result of make dist
.
https://stackoverflow.com/a/41360302/6485214
It could be that you can also "release" this way a PDF docs tarball (and also one for tools, etc.).
After moving website, it might be useful to think of converting the format from HTML to "restructured Text" https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html ... maybe!?
After moving website
Where is it?
to "restructured Text"
I'm for it. There is also a need to make some updated. This can be done while converting.
Where is it?
It's currently kept in a subversion repo, that the abiword people run. When I edit files and commit, it auto-publishes to the web..
I've been too lazy to dumpthe svn into git ... in part because it would be useless. I've thought of getting linkgrammar.org but never got around to it. I don't really feel like adminin one more webserver.