continue
continue copied to clipboard
Restore the useLocalCrawling & maxDepth settings for indexed documents
Description
This re-introduces the useLocalCrawling & maxDepth configuration parameters for document indexing as they were ignored since the JSON to YAML configuration migration.
Checklist
- [x] I've read the contributing guide
- [x] The relevant docs, if any, have been updated or created
- [x] The relevant tests, if any, have been updated or created
Tests
DocsService tests are skipped and commented at the moment :point_right: https://github.com/continuedev/continue/blob/bbb81ff032608e03a2208be908c1394da228ad6a/core/indexing/docs/DocsService.skip.ts
Your cubic subscription is currently inactive. Please reactivate your subscription to receive AI reviews and use cubic.
Deploy Preview for continuedev ready!
| Name | Link |
|---|---|
| Latest commit | 16b80d9117acd481e3b6ad531bb8649611f1bb77 |
| Latest deploy log | https://app.netlify.com/projects/continuedev/deploys/68516b828b9d6c00083ce358 |
| Deploy Preview | https://deploy-preview-5958--continuedev.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify project configuration.
All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.
I have read the CLA Document and I hereby sign the CLA
😱 Found 1 issue. Time to roll up your sleeves! 😱
All tests are green :tada:
So recurseml, is that enough sleeve rolling for you ? :smirk: :rofl:
Bump
Could someone do a quick review of this PR ? :innocent: It's very simple and is highly needed in my team :blush:
This PR is now one month old, anyone to check it and merge it ? :cry:
@vincentkelleher do you need the maxDepth param specifically? We want to merge this with useLocalCrawling but maybe deprecate maxDepth in favor of an allowList/blockList pattern
Apologies for the delays!
@RomneyDa I was just aware of maxDepth because it was there historically.
I imagine allowList/blockList would be a list of regex ?
Thanks for the feedback :blush:
Got it! So would it solve your issue if I merged and then removed maxdepth and kept uselocalCrawling?
(Or if you'd like to)
Yes, I think glob patterns for allow/block
@RomneyDa I have the feeling that maxDepth requires less thinking and is safer as you won't explicitly know how many pages will be indexed by each glob, don't you think ?
we're thinking also about adding a maxPages to give a more direct limit, but I think people generally either want to index all docs that match a pattern (with a hard limit perhaps). maxDepth doesn't create any hard limit, it could yield tens of thousands of pages in somewhat edge casey scenarios
would maxPages and useLocalCrawling be sufficient?
The other issue with maxDepth is it's not super clear how it works, i.e. as a dev I can't keep a 3-link-deep map of the docs pages I want in my head
It's true that there are clearly two types of limits:
- hard limits with
maxPages - soft limits with
maxDepth,allowListandblockList
Usually having a max depth of 1 seems like a reasonable case as you want everything directly linked to the subject of the page, doing more than 1 or 2 would directly bring you to indexing the whole website in most cases IMHO. I also think that having an allow or block list would be about the same, if not worse, than a max depth over 1 as you will have to clearly know the sitemap.
Having a maximum number of pages would be a good guard-rail to avoid using too many hardware resources, that seems like a good feature :+1:
I would go for useLocalCrawling, maxDepth and maxPages :innocent:
@vincentkelleher appreciate the feedback! Do you currently have cases for which you set maxDepth > 1?
@RomneyDa I don't have any in mind right now :thinking:
After running by team opened a new PR to remove maxDepth for YAML, opened a ticket to add maxPages/allow/block list or similar to replace. Will leave useLocalCrawling. Thanks for the contribution!
:tada: This PR is included in version 1.1.0 :tada:
The release is available on:
Your semantic-release bot :package::rocket: