domain-list-community icon indicating copy to clipboard operation
domain-list-community copied to clipboard

Comment `category-scholar-!cn`

Open IceCodeNew opened this issue 4 years ago • 16 comments

The following line should not be included by default, as it could ruin the out-of-box experience for the people who most likely need this category.

Fix #674

IceCodeNew avatar Oct 20 '21 04:10 IceCodeNew

I don't think so. Most students actually use the Internet service from common ISPs "out of box". They most likely only switch to education networks on demand.

database64128 avatar Oct 20 '21 06:10 database64128

This PR also breaks the convention of categorizing domains using the host entity's location. Traditionally we reduce the inconvenience from this by using attributes like cn. But in this case, the user can simply use category-scholar-!cn.

database64128 avatar Oct 20 '21 06:10 database64128

I don't think so. Most students actually use the Internet service from common ISPs "out of box". They most likely only switch to education networks on demand.

It turns out that my institute has special routing rules for database sites because the accessing IP shown on such sites belongs to CERNET while the one shown on other sites belongs to China Telecom. Accessing paid databases is seamless and there're no manual switches as far as I'm aware.

However, this doesn't mean that I agree with this PR. category-scholar-!cn is a mixture of both open access (OA) and paid databases, e.g., aclweb.org, sci-hub and google-scholar are OA and should remain proxied for better connectivity, acm.org, ieee, elsevier are paid and thus should be removed from the list.

moetayuko avatar Oct 20 '21 07:10 moetayuko

However, this doesn't mean that I agree with this PR. category-scholar-!cn is a mixture of both open access (OA) and paid databases, e.g., aclweb.org, sci-hub and google-scholar are OA and should remain proxied for better connectivity, acm.org, ieee, elsevier are paid and thus should be removed from the list.

Yes, we definitely should split the list and maintain OA scholarly sites and other sites which have to subscribe for access. PR is welcome, as I merely have free time for this kind of work.

IceCodeNew avatar Oct 20 '21 08:10 IceCodeNew

It turns out that my institute has special routing rules for database sites because the accessing IP shown on such sites belongs to CERNET while the one shown on other sites belongs to China Telecom. Accessing paid databases is seamless and there're no manual switches as far as I'm aware.

acm.org, ieee, elsevier are paid and thus should be removed from the list.

You are fortunate enough to attend schools that have good education network access for everyday use and it comes with such benefits for students. Unfortunately, most universities either don't have reliable education network plans to choose from, or don't provide this kind of access. And don't forget about remote learning.

database64128 avatar Oct 20 '21 08:10 database64128

Unfortunately, most universities either don't have reliable education network plans to choose from, or don't provide this kind of access.

What's the usecase of surfing paid databases w/o access to the academic contents?

And don't forget about remote learning.

Let's say I work from home with both university VPN for paid contents and v2ray for google scholar, v2ray often prioritizes over uni VPN that working as a gateway, so paid databases should still be whitelisted to allow forwarding to uni VPN.

moetayuko avatar Oct 20 '21 08:10 moetayuko

What's the usecase of surfing paid databases w/o access to the academic contents?

You can still view the abstract and other basic information if you have not subscribed or bought the paper.

Let's say I work from home with both university VPN for paid contents and v2ray for google scholar, v2ray often prioritizes over uni VPN that working as a gateway, so paid databases should still be whitelisted to allow forwarding to uni VPN.

You are already going out of your way to use the university VPN. What's so difficult with adding a simple rule to use direct connection for category-scholar-!cn? This project is a general-purpose domain list. Conventions and rules should not be bent to cater to some niche use cases like this.

database64128 avatar Oct 20 '21 09:10 database64128

You can still view the abstract and other basic information if you have not subscribed or bought the paper.

You still can visit these sites without proxy, I'm sure most of them are not blocked. IMO, putting these domains in geolocation-!cn will affect users whose organizations have subscripted to these sites. On the other hand, excluding these domains from geolocation-!cn will not cause people in China unable to access these sites. Also, the latter is less likely, less frequent to access these domains. So excluding scholarly sites from geolocation-!cn is sound to me.

After all, every commit that happened here is about to make some trade-offs. And we can not fit everyone's needs. Which side seems to you the real majority here? Will you always give the same answer based on the amounts of people? Or should we take the likelihood and other factors into account here?

IceCodeNew avatar Oct 20 '21 13:10 IceCodeNew

You still can visit these sites without proxy, I'm sure most of them are not blocked.

Actually, the change in this PR doesn't affect my setup. These sites will always be connected via proxy in my setup.

Will you always give the same answer based on the amounts of people?

It's not about which side is the majority. If merged, this change will set a bad precedent: some non-CN sites are purposely excluded from geolocation-!cn. You are basically changing the definition of geolocation-!cn, which is bad and could be very damaging IMO.

database64128 avatar Oct 20 '21 14:10 database64128

You still can visit these sites without proxy, I'm sure most of them are not blocked.

Actually, the change in this PR doesn't affect my setup. These sites will always be connected via proxy in my setup.

Will you always give the same answer based on the amounts of people?

It's not about which side is the majority. If merged, this change will set a bad precedent: some non-CN sites are purposely excluded from geolocation-!cn. You are basically changing the definition of geolocation-!cn, which is bad and could be very damaging IMO.

We already have loads of discussions about following the definitions or similar topics. I will just skip them (Refer to #28 and others). I would like to tell you how I developed my philosophy on maintaining this project recently, here is an example: Where are you going to put the baijiayun and the duitang sites? (Refer to #672) These sites are also "changing" the definition of the cdn definitions IMO. But I am OK putting the baijiayun under the cdn category.

There is no way you can category these sites precisely under the current project structure. And I gradually find out that pushing things too far from practicality is not going to serve any good. To compensate for what we have traded-off for the preciseness, the feature that is to label out attr comes out. But it still does not fully function. And even if you are OK with the part that has already been implemented and supported, there are a bunch of problems in utilizing this feature (Refer to #300 and other issues)

What is the point for all of these? I mean, let's re-evaluate the point for sticking to the exact category definition. For reviewers, doing so would help us reach a consensus. For users, the name of a category should tell them clearly how they are supposed to use this category.

Did I just against any point here in this very PR? I don't think so. I'm not saying that we should DELETE the line for including overseas scholarly sites in geolocation-!cn, the line is still there, just been commented. This won't against any existing category rules. For the user side, will, I had explained before. And seems you agree with me to some degree.

IceCodeNew avatar Oct 20 '21 15:10 IceCodeNew

If merged

And BTW, this PR is not going to be merged. Not until the OA sites have been separate from the scholarly sites which require a subscription.

IceCodeNew avatar Oct 20 '21 15:10 IceCodeNew

And I gradually find out that pushing things too far from practicality is not going to serve any good.

I don't think removing a bunch of non-CN scholar sites from geolocation-!cn just because some campus networks have special optimizations for them is a "practical" move for the users of this project.

These sites are also "changing" the definition of the cdn definitions IMO.

Technical terms like CDN usually refer to general concepts and their uses and meanings can change overtime. Geolocation, on the other hand, is a clear indication of service location. This concept has been widely used on the Internet with the same meaning probably for decades. I believe the categorization of our geolocation sets should not take into account non-geological factors like whether some sites are subscription-based scholar sites.

database64128 avatar Oct 20 '21 16:10 database64128

And I gradually find out that pushing things too far from practicality is not going to serve any good.

I don't think removing a bunch of non-CN scholar sites from geolocation-!cn just because some campus networks have special optimizations for them is a "practical" move for the users of this project.

These sites are also "changing" the definition of the cdn definitions IMO.

Technical terms like CDN usually refer to general concepts and their uses and meanings can change overtime. Geolocation, on the other hand, is a clear indication of service location. This concept has been widely used on the Internet with the same meaning probably for decades. I believe the categorization of our geolocation sets should not take into account non-geological factors like whether some sites are subscription-based scholar sites.

We are not making progress. Let me put it this way. Have we categorized all of the overseas sites here?

The work we have done is just a fraction of the active sites which geographically located out of China. How are we yet not overwhelmed by issues complaining about it? The long tail effect describes the nature of this problem. Turns out that it does not matter we failed to include the sites that do not have much UV.

So can we just pretend that we have never included sites like IEEE or Elsevier that most functions will need a subscription? To what extend would you except some random user come here and submit an issue complaining about having access to an overseas scholarly site that turns out been blocked by Chinese GFW?

And even if there are users been affected by the move I proposed here. They can easily solve the problem by including the category named, well, category-scholar-need-subscription or something. On the other hand, your opinion did not solve the existing problem.

Excluding domains in routing configuration is way more difficult than including domains. If we can prevent this from the beginning, why not? This will not leave another issue that can not be solved anyway.

IceCodeNew avatar Oct 21 '21 03:10 IceCodeNew