cookbook icon indicating copy to clipboard operation
cookbook copied to clipboard

Import recipes with a dedicated JSON graph

Open christianlupus opened this issue 2 years ago • 14 comments

In general, it is possible to use a @graph entry to define a JSON Schema object with references and more complex structures.

Our current importer is not able to cope with this situation. There is simply speaking no recipe found. This should be solved by creating a more stable and more robust import parser.

There are a few issues on the tracker that all boil down to this root cause. This one should be considered the central issue to tackle all these pages.

christianlupus avatar May 16 '23 15:05 christianlupus

All these issues can be used as sources for test cases in the development process

christianlupus avatar May 16 '23 16:05 christianlupus

I ran into this issue today, and it seems to be any WordPress installation running Yoast(possibly other search optimization plugins, too, but all the recipes I checked, as well as the links above) will not be able to be imported as they are using the newer more advanced Schema standard.

This does not help with a solution as it is already established the importer needs to be modified but I wanted to add what I found out.

vengefulpunk avatar Oct 12 '23 00:10 vengefulpunk

Yoast will be a popular offender, and in all cases I have investigated, those with @graph are script type="application/ld+json" class="yoast-schema-graph" - sadly it is very widely used. So far, I have only found one recipe in my recipe bookmarks where the current parser works.

for example, this Wafu Dressing validates perfectly and would be wonderful to have if we could parse it.

I don't know much about Nextcloud apps yet, but I do know some rdf, json-ld and schema.org, and I'm guessing the parser is JsonService?

teledyn avatar Nov 25 '23 04:11 teledyn

@teledyn The parsing of the websites contain the recipe information is done in the backend. I think the parsing logic should be located in lib/Helper/HTMLParser/HttpJsonLdParser.php which extends an abstract parser class. The idea was/is to support more parsers in the future, but they would need to be written first ;)

seyfeb avatar Nov 25 '23 10:11 seyfeb

It does look like the code attempts to correct for @graph

		// Look through @graph field for recipe
		$this->mapGraphField($json);

		// Look for an array of recipes
		$this->mapArray($json);

teledyn avatar Apr 28 '24 16:04 teledyn

Just to add my voice to the chorus, as much as I like using the app, probably only about 1 in 10 recipes that I try importing do so successfully. The rest fail with the parser error. Manually adding recipes, on the other hand, is laborious enough that I rarely take the time to add recipes anymore. It involves lots of switching back and forth to copy and paste line by line, downloading images and uploading to nextcloud, remembering urls, etc. Perhaps if a more robust parser is too large an undertaking, how about improving the add recipe user interface?

a575606 avatar May 11 '24 06:05 a575606

Perhaps if a more robust parser is too large an undertaking, how about improving the add recipe user interface?

What exactly do you have in mind? There is a major UI rework currently on its way. If there is a good suggestion, this might (I cannot guarantee implementation, though) be be added. Maybe you could open a discussion (or new issue) to discuss this (to avoid cluttering this issue here)?


For the original problem: We are hearing your issues. The problem is that the schema.org standard allows for a zillion different variants on how meta data (like the recipes in the pages imported) can be represented. Also not all pages are handling this conform to the standard. We have to keep that together and write a generic parser.

However, we want to write it such that it can be extended and augmented as need arises. The current implementation is not really built with these constraints in mind. Thus, a complete restructuring of the parser needs to be carried out.

There is already a prototype on its way to test out if the architecture assumptions hold true and lead to a good architecture. We then need to implement this in the cookbook itself. I guess we will push out a version 0.11.1 before that as there are some urgent things to handle with the release of NC29.

christianlupus avatar May 14 '24 09:05 christianlupus

I'd love to take a crack at the parser issue, but I'm finding it quite difficult to set up testing locally. I created a sample html and json file in tests/Unit/Helper/HTMLParser/res_JsonLd/ and added the test case to tests/Unit/Helper/HTMLParser/HttpJsonLdParserTest.php, but how do I debug the test? I tried following the instructions in the quickstart guide but the command seems to fail to build a test fixture.

python3 ./run-locally.py --create-fixture stable25 stable25 --activate-fixture stable25

I'm on a Mac if that matters. There's a lot of these "permission denied" lines in chown, and then it fails after.

chown: changing ownership of '/var/www/html/SECURITY.md': Permission denied
chown: changing ownership of '/var/www/html/psalm.xml': Permission denied
chown: changing ownership of '/var/www/html/.htaccess': Permission denied
chown: changing ownership of '/var/www/html/.idea/codeStyleSettings.xml': Permission denied
chown: changing ownership of '/var/www/html/.idea': Permission denied
Running the main script as user runner
Cannot write into "config" directory!
This can usually be fixed by giving the web server write access to the config directory.

But, if you prefer to keep config.php file read only, set the option "config_is_read_only" to true in it.
See https://docs.nextcloud.com/server/25/go.php?to=admin-config
Elapsed time (Server installation): 19.896853178999997
[T] Running auxiliary post-install scripts
[T] Installation of NC server is finished.
Elapsed time (Server installed): 59.537169513
Elapsed time (Environment preparation): 61.698601778000004
Elapsed time (Installation of plain server): 61.698678821
[D] Creating sub-fixture in volumes/dumps/fixtures/stable25/plain
[T] Save the data files
rsync: --delete-delay: unknown option
rsync error: syntax or usage error (code 1) at /AppleInternal/Library/BuildRoots/f84c363d-9006-11ee-8578-1ae9d66b0597/Library/Caches/com.apple.xbs/Sources/rsync/rsync/main.c(1337) [client=2.6.9]
Traceback (most recent call last):
  File "/Users/ben/Nextcloud/Projects/cookbook/.github/actions/run-tests/./run-locally.py", line 79, in <module>
    subfixture.create(fixturePath, name='plain', db=db, sql_type=sql_type)
  File "/Users/ben/Nextcloud/Projects/cookbook/.github/actions/run-tests/test_runner/dumps.py", line 197, in create
    self.__cloneFiles(subFixturePath)
  File "/Users/ben/Nextcloud/Projects/cookbook/.github/actions/run-tests/test_runner/dumps.py", line 26, in __cloneFiles
    p.pr.run(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/subprocess.py", line 460, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['rsync', 'volumes/data/', 'volumes/dumps/fixtures/stable25/plain/data', '--delete', '--delete-delay', '--delete-excluded', '--archive']' returned non-zero exit status 1.

wenbenz avatar Jul 26 '24 05:07 wenbenz

This is only a guess, but did you remember to run that command sudo -u www-data?

On Fri, Jul 26, 2024 at 1:15 AM Ben Zhao @.***> wrote:

I'd love to take a crack at the parser issue, but I'm finding it quite difficult to set up testing locally. I created a sample html and json file in tests/Unit/Helper/HTMLParser/res_JsonLd/ and added the test case to tests/Unit/Helper/HTMLParser/HttpJsonLdParserTest.php, but how do I debug the test? I tried following the instructions in the quickstart guide https://nextcloud.github.io/cookbook/dev/misc/automated-testing but the command seems to fail to build a test fixture.

python3 ./run-locally.py --create-fixture stable25 stable25 --activate-fixture stable25

I'm on a Mac if that matters. There's a lot of these "permission denied" lines in chown, and then it fails after.

chown: changing ownership of '/var/www/html/SECURITY.md': Permission denied chown: changing ownership of '/var/www/html/psalm.xml': Permission denied chown: changing ownership of '/var/www/html/.htaccess': Permission denied chown: changing ownership of '/var/www/html/.idea/codeStyleSettings.xml': Permission denied chown: changing ownership of '/var/www/html/.idea': Permission denied Running the main script as user runner Cannot write into "config" directory! This can usually be fixed by giving the web server write access to the config directory.

But, if you prefer to keep config.php file read only, set the option "config_is_read_only" to true in it. See https://docs.nextcloud.com/server/25/go.php?to=admin-config Elapsed time (Server installation): 19.896853178999997 [T] Running auxiliary post-install scripts [T] Installation of NC server is finished. Elapsed time (Server installed): 59.537169513 Elapsed time (Environment preparation): 61.698601778000004 Elapsed time (Installation of plain server): 61.698678821 [D] Creating sub-fixture in volumes/dumps/fixtures/stable25/plain [T] Save the data files rsync: --delete-delay: unknown option rsync error: syntax or usage error (code 1) at /AppleInternal/Library/BuildRoots/f84c363d-9006-11ee-8578-1ae9d66b0597/Library/Caches/com.apple.xbs/Sources/rsync/rsync/main.c(1337) [client=2.6.9] Traceback (most recent call last): File "/Users/ben/Nextcloud/Projects/cookbook/.github/actions/run-tests/./run-locally.py", line 79, in subfixture.create(fixturePath, name='plain', db=db, sql_type=sql_type) File "/Users/ben/Nextcloud/Projects/cookbook/.github/actions/run-tests/test_runner/dumps.py", line 197, in create self.__cloneFiles(subFixturePath) File "/Users/ben/Nextcloud/Projects/cookbook/.github/actions/run-tests/test_runner/dumps.py", line 26, in __cloneFiles p.pr.run( File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/subprocess.py", line 460, in check_returncode raise CalledProcessError(self.returncode, self.args, self.stdout, subprocess.CalledProcessError: Command '['rsync', 'volumes/data/', 'volumes/dumps/fixtures/stable25/plain/data', '--delete', '--delete-delay', '--delete-excluded', '--archive']' returned non-zero exit status 1.

— Reply to this email directly, view it on GitHub https://github.com/nextcloud/cookbook/issues/1675#issuecomment-2251989317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPLTKZKL2QJG2J5K7UIVWLZOHLNNAVCNFSM6AAAAAAYD3Q53OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJRHE4DSMZRG4 . You are receiving this because you were mentioned.Message ID: @.***>

-- Gary Lawrence Murphy - Toronto CA - Fediverse @.***> - Tumblr https://teledyn.tumblr.com - Blog https://blog.teledyn.com - Home https://www.teledyn.com/

teledyn avatar Jul 28 '24 19:07 teledyn

Which Sudo -u www-data command? I don't see that in the guide.

wenbenz avatar Jul 30 '24 13:07 wenbenz

when you execute any of the nextcloud occ actions, you need to be under the same unix account as the files themselves, and since we normally install webpages with the fake-user www-data, you cannot login as them, so the "sudo -u www-data" prefix will switch you into that account to do that command.

I'm assuming your instructions were to install nextcloud under a dedicated unix account. If it is under your own account (and so is your webserver) then it isn't needed.

On Tue, Jul 30, 2024 at 9:18 AM Ben Zhao @.***> wrote:

Which Sudo -u www-data command? I don't see that in the guide.

— Reply to this email directly, view it on GitHub https://github.com/nextcloud/cookbook/issues/1675#issuecomment-2258331188, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPLTKZ2MB4EOK2BRWFZBGLZO6HBBAVCNFSM6AAAAAAYD3Q53OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJYGMZTCMJYHA . You are receiving this because you were mentioned.Message ID: @.***>

-- Gary Lawrence Murphy - Toronto CA - Fediverse @.***> - Tumblr https://teledyn.tumblr.com - Blog https://blog.teledyn.com - Home https://www.teledyn.com/

teledyn avatar Jul 31 '24 20:07 teledyn

No, the instructions were to create a test fixture as follows:

Create and activate a testing fixture:

./run-locally.py --create-fixture stable25 stable25 --activate-fixture stable25. The default uses MariaDB and you might consider adding --use-db-dump.

wenbenz avatar Aug 01 '24 13:08 wenbenz