Andrea Francis Soria Jimenez
Andrea Francis Soria Jimenez
After doing some cache maintenance actions manually (removing obsolete records which config or split no longer exist) this is the updated list mostly AttributeError and ClientResponseError reduced: ``` [ {...
Update of UnexpectedErrors count by kind: ``` db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kindkind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kindkind: 'config-parquet-and-info' }, count: 9117 }, {...
Updated list of UnexpectedErrors by kind: ``` [ { _id: { kindkind: 'config-parquet-and-info' }, count: 8500 }, { _id: { kindkind: 'split-descriptive-statistics' }, count: 2628 }, { _id: { kindkind:...
Today: `Atlas atlas-x5jgb3-shard-0 [primary] datasets_server_cache> db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) ` ``` [ { _id: { kind: 'config-parquet-and-info' }, count: 6215 }, {...
Today: ``` Atlas atlas-x5jgb3-shard-0 [primary] datasets_server_cache> db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kind: 'config-parquet-and-info' }, count: 7373 }, { _id:...
Today: ``` db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kind: 'config-parquet-and-info' }, count: 6668 }, { _id: { kind: 'split-descriptive-statistics' },...
After refreshing some records: ``` Atlas atlas-x5jgb3-shard-0 [primary] datasets_server_cache> db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kind: 'split-duckdb-index' }, count: 1380...
Today (Almost half of yesterday's): ``` Atlas atlas-x5jgb3-shard-0 [primary] datasets_server_cache> db.cachedResponsesBlue.aggregate([{$match: {error_code: "UnexpectedError", "details.copied_from_artifact":{$exists:false}}},{$group: {_id: {kind: "$kind"}, count: {$sum: 1}}},{$sort: {count: -1}}]) [ { _id: { kind: 'split-duckdb-index' }, count:...
Now that the index file has been removed, I see another issue: data:image/s3,"s3://crabby-images/cf419/cf419cf7899e6e7dc892e27f604b0824f058d317" alt="image" `ValueError: Directory name did not appear to be a partition: v1.1.0`
https://github.com/huggingface/dataset-viewer/pull/2928 will add a specific stemmer for a dataset only if it is marked as monolingual. (That is, only one language for all splits). But there are some caveats as:...