Science Gossip Talk

Being assigned the same page more than once?

  • ssgiris by ssgiris moderator

    What should folks do if they are are assigned the same page to classify more than once?

    I've gotten pages more than once - I recognize them because of the odd illustration, or the contents of a table.

    Should we skip those? Hashtag? Classify again?

    @DMZ who should we contact to see if this is a bug?

    Posted

  • yshish by yshish moderator in response to ssgiris's comment.

    I would report those to the developers since it shouldn't happen and could be caused by a bug - depends on the image ID. You can check the comments. If image ID numbers differ, it is just uploaded twice..

    Zuzi

    Posted

  • DZM by DZM admin

    The best thing to do would be to save the IDs, because it's possible that they are repeat scans rather than repeat serving of the same image.

    We'll want to be sure that the system really is serving repeats before we try to figure out why it is. 😃

    Thanks!!

    (P.S.: It's D-Z-M. 😃 The DMZ is something else entirely!)

    Posted

  • yshish by yshish moderator

    Hey,

    I checked my Recents and have found some images more than once there - checked the image ID and it is the same for both! All were cassified as 'no illustration' ones. But I think they shouldn't appear more than once anyway.

    Here they are:

    There may be more such ones.. I'm not about to go through all .)

    Zuzi

    Posted

  • ssgiris by ssgiris moderator

    @DZM 😃

    I've been skipping the second servings of a page I recognize, so I'm not sure how I would go about finding them.

    ssgiris

    Posted

  • yshish by yshish moderator in response to ssgiris's comment.

    @DZM Do skipped images appear among the 'Recent' ones?

    If yes, then you @ssgiris would find them there.

    Zuzi

    Edit: I made an experiment and yes, they do! So you can go through the thumbnails (in case there are not 324532543 of them.) and open each of the two same ones in a different tab and compare their ID numbers.

    Posted

  • jules by jules moderator in response to yshish's comment.

    Yes they do but finding the first instance of a page you are sure is a repeat isn't easy when you've classified a fair few! I've had some I'm sure I've seen before but there's too many to search.

    Posted

  • DZM by DZM admin in response to yshish's comment.

    Alright, @yshish -- thanks for the report and the confirmed IDs. Keep us posted if you find others!

    I will put through an issue, but I'm not going to make a wild amount of noise about it yet. I am not too concerned if this happens less than 1% of the time, but if it's becoming a more common thing, I will start raising alarms. 😃 Thanks!!

    Posted

  • yshish by yshish moderator in response to DZM's comment.

    Hey,

    OK. I will.

    The thing which I worry a bit more about is the 'No data' notification. It happens almost always when I start the classification, sometimes the reloading doesn't help and it appears again. Sometimes it pops up even after finishing a page instead of loading a new one...

    It actually reminds me the recent issue on the Floating Forests when there were no data from one of the locations. Could it be possible here? (I mean that one of the documents would have been completely classified and there were no unclassified pages available..?)

    Just an idea:]

    Zuzi

    Posted

  • jules by jules moderator

    Here's another:
    ASC0000c91

    First got this 13 March and then again yesterday (29 March). Same ID.

    Posted

  • yshish by yshish moderator

    Another report of a repeating image: ASC0000gpb

    Posted

  • yshish by yshish moderator

    I have figured out that some images display twice among my Recent however I classified them only once. It could be caused by making a pause during the classification. I' m curious whether it counts the classification as skipped awhen it goes to my recent before finishing it.

    @DZM If I send you IDs, are you able to figure it out? There are more of them.

    Working on tablet.

    Zuzi

    Posted

  • DZM by DZM admin

    The more IDs that I get, the more I can give to the devs... 😃

    Posted

  • yshish by yshish moderator in response to DZM's comment.

    Ok. The lat ones:

    Looks like two different scans of the same page, ID numbers are different!

    Will look for some others later.. the Plankotn is calling! 😃

    Thanks!

    Posted

  • yshish by yshish moderator

    Other repeated images: ASC0000if7 and ASC000057p and ASC0000a47 and ASC0003dec (with the same ID)

    Posted

  • tfmorris by tfmorris

    These two: ASC0000gce ASC0000gt6
    are from the same journal page scanned multiple times at the Internet Archive and that duplication was propagated all the way through the pipeline at BHL, then SG.

    The entire volume of the journal is probably going to end up getting processed twice.

    Posted