[suggestion] Display dialog to set title + author(s)
Hello,
If an extension can display a dialog box with input fields, I suggest grabbing the HTML page's title (and author, if available), let the user edit them, and then save them as meta-data in the EPUB file. Otherwise, infos might display garbage in the reader. I currently run a Python script for that purpose.
<title>My title</title>
<meta name="author" content="My author">
Cheers,
This is not an insubstantial amount of work, but I'll look to see how difficult it is to do.
Right now the way the extension works is it grabs the whole page and then sends it off to be processed and converted to an epub, before returning that epub. It'll be pretty difficult to get the title and author automatically, then ask confirmation, then bundle the epub, as it'll require another step of communication.
What'd be more feasible is an option that just prompts at the beginning without scanning the page. Would that be sufficient?
Also, re: your python script. I remember you saying that you ran it to add this metadata, but did you just mean to modify it or do you mean add it? Currently the author is written only to the OPF file, but your instructions look like meta tags in the xhtml section itself? Is that what your e-readers read?
Whatever works, you're the expert :-)
Here's the code I use to modify the title + author(s) in an EPUB file:
import pyperclip
import os
import ebookmeta
#======== grab input filename from clipboard
INPUTFILE = pyperclip.paste()
#! could be anything! Use OK/Cancel dialog
#if not s:
if not INPUTFILE or ".epub" not in INPUTFILE:
print ("Must pass an EPUB filename")
exit()
print(f"Handling {INPUTFILE}")
#======== grab author + title from filename
#author#title.epub
x = [x.strip() for x in INPUTFILE.split('#')]
file_author=x[0]
#ignore file extension
file_title, _ = os.path.splitext(x[1])
print(f"Author:{file_author}, Title={file_title}")
#======== edit metadata: set author + title
meta = ebookmeta.get_metadata(INPUTFILE)
meta.set_author_list_from_string(file_author)
meta.title = file_title
ebookmeta.set_metadata(INPUTFILE, meta) # Set epub metadata from Metadata class
Okay, I've added an option that allows doing this, at least in my testing:
Once I release a new version, you should be able to test it / see if it works for you, or refine it. Google is currently reviewing the extension, so I'm not sure when I'll be able to publish an update.
@Shohreh 5.2 is out and should have these setting in it
Yup, 5.2.0 works after enabling "Prompt for Title and Author" in the extension's options :-)
If available, would it be possible to pre-fill the fields with what's in
Unfortunately that is fairly difficult to do. The way chrome extensions are set up, clicking on the icon can either open the pop-up or immediately trigger the app. There may be a way to programmatically open it, but then there's also issues with synchronization between tabs, timing out, etc.
Additionally, the website needs to be sent to another process to render it and extract the author and title. Interceding in that would be even more effort.
I say this to say that it is maybe possible. Unfortunately due to the complexity, and generally the availability of other tools to modify these, I'm probably not going to allocate time to doing it.
I'll leave this open in case anyone wants to add that functionality.
I understand. In the meantime, I'll write an AutoIt script to 1) open the page in view-source mode, copy its HTML, grab the title and possibly the meta line, and try to paste them into rePub's fields.
There's even a way to assign a keyboard shortcut to an extension, no need to rely on AutoIt's MouseClick():
-- Edit: Work in progress
#include <MsgBoxConstants.au3>
#include <StringConstants.au3>
Func _ReadTitleAuthor($sInput, $sPattern)
Local $aResult = StringRegExp($sInput, $sPattern, $STR_REGEXPARRAYMATCH)
return (Not @error) ? ($aResult[0]) : ("")
EndFunc
Func _WinWaitActivate($title,$text,$timeout=5)
Local $result = False
$result = WinWait($title,$text,$timeout)
if $result = False Then Exit MsgBox($MB_ICONERROR,"Bad...","Window not displayed")
If Not WinActive($title,$text) Then WinActivate($title,$text)
WinWaitActive($title,$text,$timeout)
EndFunc
Func _HTTP_ResponseText($URL)
$oHTTP = ObjCreate("winhttp.winhttprequest.5.1")
$oHTTP.Open("GET", $URL)
$oHTTP.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 4.0.20506)")
$oHTTP.Send()
return $oHTTP.ResponseText
EndFunc
Opt("WinTitleMatchMode", 2) ;1=start (default), 2=subStr, 3=exact, 4=advanced (deprecated), -1 to -4=Nocase
_WinWaitActivate("[TITLE: - Google Chrome;CLASS:Chrome_WidgetWin_1]","")
Send("{CTRLDOWN}l{CTRLUP}{CTRLDOWN}c{CTRLUP}{ESC}")
Local $sURL = ClipGet()
ConsoleWrite($sURL & @CRLF)
$HTML = _HTTP_ResponseText($sURL)
$sTitle = _ReadTitleAuthor($HTML , "<title>(.+)</title>")
$sAuthor = _ReadTitleAuthor($HTML , '<meta name="author" content="(.+?)" />')
Local $sTitleAuthor = StringFormat("%s#%s", $sTitle, $sAuthor)
ConsoleWrite($sTitleAuthor & @CRLF)
;Title#Author; Use keyboard shortcuts for faster use: End, CTRL+SHIFT+left, CTRL+X, CTRL+V
ClipPut($sTitleAuthor)
Looks cool! Yeah, I'm also hesitant about what permissions to give. Currently the extension does some "fancy" things to avoid unnecessary permissions, e.g. it never modifies the current web page, just "saves" it and parses it, so there's no risk of script injection on user sites. The downside is extensions don't have dom parsers or anything else, which is where the complexity comes from.
I hope your solution works for you! Since you seem to be willing to code some things. If you want to dig into what it would take to bake this in. The flow would be be something like:
- clicking the button triggers an "action", e.g. a callback. Currently this saves the page, then sends it to a worker. The current solution uses
setPopupto enable the popup when you click. Instead, I think after triggers the normal action, try to parse the title and author from the page (I have a few ideas about this, but a PR could just have these be dummies). - Then after triggering the action, first call
setPopup, then callopenPopupto open the popup now that you have the title and author. - Then use something like
sendMessageto send the title and author, and wait for a confirmation. This is the step I have up on. You have to hande edge cases like timeouts, different windows, messaging states, etc. Plus the messaging api is already used for other purposes, so you have to make sure that connections and messages go only to the right place or are otherwise ignored. You also have to handle race conditions in terms of when the pop up opens, if it closes early, multiple popups / multiple pages, etc. If you can get a clever solution to this, I think we should be good. - now you have the title and author, the rest of the flow should work as expected.
Unfortunately, I know nothing about extensions (and JavaScript).
Using the AutoIt script above, I noticed some web pages return nothing, not even the Title. Maybe they don't like being fetched by the _HTTP_ResponseText().
As a work-around, I split it into two scripts;
- The user copies the URL in the clipboard
- A Python script downloads the page, grabs the Title and, if available, the Author(s) in meta, copies both infos to the clipboard as "title#authors", and runs an AutoIt script (that was compiled as an EXE)
- The AutoIt script simply puts the focus back on Chrome (whose active window should still be the page we're interested in), opens the rePub extension through a user-assigned keyboard shortcut, reads the clipboard and splits it in half using # as the separator, hits TAB to move to the Title field in rePub's mini-dialog , pastes the Title, TAB to Author(s) , and finally clicks on its Capture button.
Python:
import pyperclip
import urllib.request
from bs4 import BeautifulSoup
import subprocess
#pip install tldextract
import tldextract
AUTOIT = r"c:\AutoIT\rePub.fill.fields.exe"
URL = pyperclip.paste()
#to solve occasional "urllib.error.HTTPError: HTTP Error 429: Too Many Requests"
req = urllib.request.Request(
URL,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
#resp = urllib.request.urlopen(URL)
resp = urllib.request.urlopen(req)
soup = BeautifulSoup(resp, 'lxml')
title = soup.title.text.strip()
print("Title: ",title)
#<meta name="author" content="My author />
meta = soup.head.find("meta", {"name":"author"})
if meta:
authors = meta["content"]
else:
#second try
regex = '{"@type":"Person","name":"(.+?)"}'
resp = resp.read().decode('utf-8')
match = re.search(regex, resp)
if match != None:
authors = match.group(1)
else:
#authors = "No author found"
extracted_info = tldextract.extract(URL)
print("Domain name is:", extracted_info.domain)
authors = extracted_info.domain
print("Author(s): ", authors)
output = f"{title}#{authors}"
print(output)
pyperclip.copy(output)
result = subprocess.run(AUTOIT, capture_output=True, text=True)
AutoIt:
#include <StringConstants.au3>
Func _WinWaitActivate($title,$text,$timeout=5)
Local $result = False
$result = WinWait($title,$text,$timeout)
if $result = False Then Exit MsgBox($MB_ICONERROR,"Bad...","Window not displayed")
If Not WinActive($title,$text) Then WinActivate($title,$text)
WinWaitActive($title,$text,$timeout)
EndFunc
Func _ReadTitleAuthor($sInput, $sPattern)
Local $aResult = StringRegExp($sInput, $sPattern, $STR_REGEXPARRAYMATCH)
return (Not @error) ? ($aResult[0]) : ("<NOT_FOUND>")
EndFunc
;read title#authors from cliboard
Local $sTitleAuthors = ClipGet()
Local $sTitle = _ReadTitleAuthor($sTitleAuthors , "^(.+)#")
Local $sAuthor = _ReadTitleAuthor($sTitleAuthors , "#(.+)$")
ConsoleWrite(StringFormat("%s#%s", $sTitle, $sAuthor) & @CRLF)
;fill rePub fields
;chrome://extensions/shortcuts
;click on extension and fill: Assigned CTRL+SHIFT+P to rePub extension
Opt("WinTitleMatchMode", 2) ;1=start (default), 2=subStr, 3=exact, 4=advanced (deprecated), -1 to -4=Nocase
_WinWaitActivate("[TITLE: - Google Chrome;CLASS:Chrome_WidgetWin_1]","")
Send("{CTRLDOWN}{SHIFTDOWN}p{CTRLUP}{SHIFTUP}")
Sleep(500)
Send("{TAB}" & $sTitle & "{TAB}" & $sAuthor & "{TAB}{ENTER}")
Simpler alternative: Just use AutoIt to grab the page's title and domain name in URL as author:
#include <MsgBoxConstants.au3>
#include <StringConstants.au3>
#include <Constants.au3>
#include <Clipboard.au3>
#include <File.au3>
#include <String.au3>
#include <INet.au3>
#include <IE.au3>
Func _WinWaitActivate($title,$text,$timeout=5)
Local $result = False
$result = WinWait($title,$text,$timeout)
if $result = False Then Exit MsgBox($MB_ICONERROR,"Bad...","Window not displayed")
If Not WinActive($title,$text) Then WinActivate($title,$text)
WinWaitActive($title,$text,$timeout)
EndFunc
Opt("WinTitleMatchMode", 2) ;1=start (default), 2=subStr, 3=exact, 4=advanced (deprecated), -1 to -4=Nocase
_WinWaitActivate("[TITLE: - Google Chrome;CLASS:Chrome_WidgetWin_1]","")
Sleep(1000)
Send("{CTRLDOWN}l{CTRLUP}{CTRLDOWN}c{CTRLUP}{ESC}{ESC}")
$sDomain = StringRegExp(ClipGet(), '^https?://(.+?)/', $STR_REGEXPARRAYMATCH)
$URL = @error ? "Bad URL" : $sDomain[0]
ConsoleWrite("URL:" & $URL & @CRLF)
Sleep(500)
;grab title
;ESC doesn't prevent URL from being added as bookmark: Must TAB and click on Remove
Send("{CTRLDOWN}d{CTRLUP}{CTRLDOWN}c{CTRLUP}{TAB 3}{ENTER}")
;might be very long and slow to paste: Only grab beginning
$sTitle= StringLeft(ClipGet(),64)
ConsoleWrite("Title:" & $sTitle & @CRLF)
Sleep(500)
;click on extension and fill: Assigned CTRL+SHIFT+P to rePub extension
Send("{CTRLDOWN}{SHIFTDOWN}p{CTRLUP}{SHIFTUP}")
Sleep(500)
Send("{TAB}" & $sTitle & "{TAB}" & $URL & "{TAB}{ENTER}")