QuoteUnquote icon indicating copy to clipboard operation
QuoteUnquote copied to clipboard

Tutorial for web page source

Open pekikz opened this issue 9 months ago • 7 comments

Hi

Is there any tutorial for the web page source for external database?

What should i put in these columns (URL, XPath source and XPath quote) to make it works?

Note : I have my own website and want to sync it with this app

Thank you

pekikz avatar May 04 '24 09:05 pekikz

Hi - thanks for taking the time to raise this issue :-)

There's no tutorial. But one might well come out of this issue!

The following two files contain example values - that might help you:

  • https://github.com/jameshnsears/QuoteUnquote/blob/main/app/src/main/java/com/github/jameshnsears/quoteunquote/utils/scraper/Scraper.kt -https://github.com/jameshnsears/QuoteUnquote/blob/main/app/src/test/java/com/github/jameshnsears/quoteunquote/utils/scraper/ScraperTest.kt

If these files aren't too clear then if you can you let me know the address of the site & let me know where the quote and source are on the page then I'll get back and be more specific. I'll also test that the code works with your site :-)

jameshnsears avatar May 06 '24 15:05 jameshnsears

Thank you for the response

I'm still testing the page and trying to connect it to the app. Is it okay to use.xml?

https://buatmudah.com/quote.xml

pekikz avatar May 07 '24 03:05 pekikz

Hi - thanks for updating the issue.

The app uses a library called jsoup - from https://jsoup.org/ - it expects your response to be in HTML 5; jsoup "converts" XML responses into HTML 5 prior to applying the XPath query.

This means that your response of:

<?xml version="1.0" encoding="UTF-8"?>

<quote>
  <text>Isi quote</text>
  <source>sumber</source>
</quote>

gets turned into:

<!--?xml version="1.0" encoding="UTF-8"?-->
<html>
 <head></head>
 <body>
  <quote>
   <text>
    Isi quote
   </text>
   <source>sumber
  </quote>
 </body>
</html>

This isn't normally an issue as "//" at the start of your XPath is often a viable option.

What is an issue is the use of the tag source. This has a special meaning to HTML 5.

The answer is to not use source but, say, something like s

  • https://buatmudah.com/quote.xml
  • //quote/text
  • //quote/s

Try that you and let me know if you still have issues - as it's not a problem me helping more.

I'll leave this issue open for a while.

jameshnsears avatar May 07 '24 15:05 jameshnsears

Thank you, it worked

I change "source" to "sumber". Now if i want to have more than 1 quote, do i need to make it like this?

<?xml version="1.0" encoding="UTF-8"?>


<isi>
 <quote>
  <text>quote's content 1</text>
  <sumber>quote's source 1</sumber>
 </quote>
 <quote>
  <text>quote's content 2</text>
  <sumber>quote's source 2</sumber>
 </quote>
</isi>
  • https://buatmudah.com/quote.xml
  • //isi/quote/text
  • //isi/quote/sumber

pekikz avatar May 09 '24 10:05 pekikz

Hi - the app is coded to work in a generic way. What this means is that your site would need to change the content of the quote.xml file on a regular basis.

The quote.xml file would only every contain a single text & sumber entry; put another way, quote.xml only every contains 1 quote.

Your website would - say, every 24 hours - then update the quote.xml file with a new text & sumber entry.

Let me know if this isn't clear or if I can be more help.

jameshnsears avatar May 09 '24 17:05 jameshnsears

Got it. So only one quote that can be imported by the app. As for now, i'm still using csv file, and it's working great

Thank you for your help 😊

pekikz avatar May 09 '24 21:05 pekikz

OK - thanks for the update. I'll leave this issue open a little longer just in case anything else crops up & I can help more.

jameshnsears avatar May 10 '24 06:05 jameshnsears

Next release will include some better instructions inside the app.

jameshnsears avatar May 14 '24 09:05 jameshnsears

Hi - I'm closing this issue now. I released a new version of the app 4.39.0 which includes improved instructions.

Thanks for raising this issue!

jameshnsears avatar May 16 '24 06:05 jameshnsears