surgeon
surgeon copied to clipboard
Add ability to format the result
There has been a request to add a "formatting" ability like in scrape-it library.
Its documented as:
convert (Function): An optional function to change the value.
Example:
{
articles: {
listItem: ".article"
, data: {
createdAt: {
selector: ".date"
+ , convert: x => new Date(x)
}
, title: "a.article-title"
, tags: {
listItem: ".tags > span"
}
, content: {
selector: ".article-content"
, how: "html"
}
}
}
}
Considerations:
- Need to consider how this integrates with validation (does formatting happen before, after)
- Whats the API?
Re: API, I've toyed with idea of passing arrays to indicate selector + transforms, a la createdAt: ['.date', x => new Date(x)]
. IMO, it's easier to read than createdAt: { selector: ".date", convert: x => new Date(x) }
especially when you have many transforms in your schema.
Lets say you select all links in a document and want to filter out duplicates.
sm a|ra href
Any user-defined subroutine is called once per item in the array, not on the array as a whole, right? (Nor is it called as a reducer?) So I cannot make a subroutine to sort and remove duplicates from the array. Or a subroutine to flatten the array.
It can be done if the subroutine combines select and read :)
sl: (subject, v, b) => selectSubroutine(subject, ['a', '{0,}'], b).map(match => readSubroutine(match, ['attribute', 'href'], b))
Wow powerful stuff:
function sortAndRemoveDups(arr) {
const sorted = arr.sort();
const uniq = [];
let prev = null;
for (let i = 0; i < sorted.length; i += 1) {
if (sorted[i] !== prev) { uniq.push(sorted[i]); }
prev = sorted[i];
}
return uniq;
}
...
slb: (s, v, b) => sortAndRemoveDups(selectSubroutine(s, [v.concat('a:not([href^="#"])').join(' '), '{0,}'], b).map(m => readSubroutine(m, ['attribute', 'href'], b)))
...
allRealLinksUnderBody: slb body