Science Gossip Talk

Retired no_illustration pages being presented?

  • tfmorris by tfmorris

    I was looking at the API network traffic out of curiosity and when this page:

    http://zooniverse-static.s3.amazonaws.com/www.sciencegossip.org/subjects/standard/54f4b8a230017b04c9003685.jpg

    was displayed, it was associated with the metadata below from the API. Without digging into the API documentation, my naive reading is that this page has been classified 13 times out of 13 as having no illustrations and that was "retired" for that reason on May 14 and currently has a state of "complete," yet here it is on display in my browser. What's going on?

    Here's a table of the 10 pages in that batch from the API:

    zooniverse_id	state	classification_count	no_illustrations_count	has_illustrations_count	skip_count	updated_at	retire_reason
    ASC0000tcf	complete	13	13			2015-05-14T14:07:44Z	detected_no_illustrations
    ASC0000tw5	complete	24	21	1	2	2015-05-14T14:12:49Z	detected_no_illustrations
    ASC0000r1z	complete	16	15	1		2015-05-14T13:46:39Z	detected_no_illustrations
    ASC0000vqe	complete	6	6			2015-05-14T14:29:07Z	detected_no_illustrations
    ASC0000ogk	complete	12	8	3	1	2015-05-14T13:22:33Z	detected_no_illustrations
    ASC0000rva	complete	20	20			2015-05-14T13:54:40Z	detected_no_illustrations
    ASC0000osn	complete	10	10			2015-10-15T00:21:12Z	detected_no_illustrations
    ASC0000wva	complete	25	25			2015-05-14T14:39:02Z	detected_no_illustrations
    ASC0000v62	complete	13	12		1	2015-05-14T14:23:51Z	detected_no_illustrations
    ASC0000v4g	complete	8	8			2015-05-14T14:23:27Z	detected_no_illustrations
    

    From the looks of it, none of these pages should be presented to me. If it makes a difference, I'd selected a book from the Periodicals page: "Wiltshire archaeological and natural history magazine."

    {
    "id": "54f4b8a230017b04c9003685",
    "activated_at": "2015-03-07T07:05:47Z",
    "classification_count": 13,
    "coords": [
      
    ],
    "created_at": "2015-03-03T19:21:00Z",
    "group": {
      "_id": "54f4b81b30017b04c9000009",
      "zooniverse_id": "GSC0000007",
      "name": "wiltshirearchaeo"
    },
    "group_id": "54f4b81b30017b04c9000009",
    "location": {
      "standard": "http:\/\/zooniverse-static.s3.amazonaws.com\/www.sciencegossip.org\/subjects\/standard\/54f4b8a230017b04c9003685.jpg",
      "thumb": "http:\/\/zooniverse-static.s3.amazonaws.com\/www.sciencegossip.org\/subjects\/thumb\/54f4b8a230017b04c9003685.jpg"
    },
    "metadata": {
      "contributor": "Natural History Museum Library, London",
      "item_id": "45554",
      "no_illustrations_count": 13,
      "original_size": {
        "width": 1680,
        "height": 2974
      },
      "page_id": "12643735",
      "page_no": null,
      "page_seq": "441",
      "sponsor": "Natural History Museum Library, London",
      "volume": "v.33=no.99-102 (1903-1904)",
      "year": "1904 - 1904",
      "retire_reason": "detected_no_illustrations"
    },
    "project_id": "54f42c0ab35d2e06bd000001",
    "random": 0.1300746338013,
    "state": "complete",
    "updated_at": "2015-05-14T14:07:44Z",
    "workflow_ids": [
      "54f42c32b35d2e06bd000002"
    ],
    "zooniverse_id": "ASC0000tcf"
    },

    Posted

  • eatyourgreens by eatyourgreens admin

    Hi,

    Thanks very much for this. The API definitely should not be sending back completed subjects, since those have been fully classified. Subjects tagged with "retire_reason": "detected_no_illustrations" are pages that were detected as having no illustrations by an OCR algorithm, so it could be that those weren't properly pulled from the active pool of classification subjects.

    I could filter the API call in the browser, but I suspect that this would lead to the front-end reporting "no more data" for the project, which isn't true either.

    Thanks, by the way, for raising this as a github issue too.

    Jim

    Posted

  • eatyourgreens by eatyourgreens admin

    This should be fixed now, and you should be seeing more illustrated pages for Wiltshire Archaeology, Quarterly Journal of the Geological Society and Hardwicke's Science Gossip.

    Jim

    Posted

  • yshish by yshish moderator

    Great! Thanks, guys 😃

    Posted