Archival PDF – rendering very slowly

Advert

Archival PDF – rendering very slowly

Home Forums The Tea Room Archival PDF – rendering very slowly

Viewing 18 posts - 1 through 18 (of 18 total)
  • Author
    Posts
  • #740947
    Michael Gilligan
    Participant
      @michaelgilligan61133

      Posting this in the hope that some forum member might be able to enlighten me …

      An experimental enquiry concerning the natural powers of water and wind to turn Mills and other machines depending on circular motion. [With plates. Reprinted from the Philosophical Transactions of The Royal Society]

      I have arranged to read Smeaton’s paper [in real printed form, I trust] next week; but I thought it worth downloading a copy for ‘prep’

      https://archive.org/details/bim_eighteenth-century_an-experimental-enquiry-_smeaton-john_1760/page/n1/mode/1up

      Selecting the PDF download option initiated a surprisingly long process, as the 83 pages each slowly rendered on screen … The file is 44MB, and the image quality is mediocre. I have copied it to GoodReader on the iPad, but it renders similarly slowly there.

      There are some technical details available on the linked page ^^^ but I see nothing particularly unusual there.

      Explanation would be most welcome !

      MichaelG.

      Advert
      #740966
      Kiwi Bloke
      Participant
        @kiwibloke62605

        ‘Fraid no explanation Michael, just more phenomenology. I followed the link, and the file downloaded in less than a minute. Each page took perhaps 10 seconds to be rendered by Firefox, online, and, offline, a few seonds only by qpdfview (under linux). I’ve found the Internet Archive to be painfully slow sometimes: I guess it depends on how much is being asked of it.

        If you’re interested in on-line archives’ content, have you played with Anna’s Archive? Can’t say anything about it, in case it’s naughty… There’s a Wikipedia page about it, from which you can probably decide your legal position.

        #740979
        John Haine
        Participant
          @johnhaine32865

          You might try the Royal Society site directly?

          #740980
          Michael Gilligan
          Participant
            @michaelgilligan61133

            Thanks, Kiwi

            MichaelG.

            #740999
            peak4
            Participant
              @peak4

              Just tried your link in Opera on a Windows 10 PC on a reasonable FTTC connection (about 17 meg I thinK).
              The file opened quickly in a new tab, but each individual page took a little longer.
              I selected the pdf option from the choice of download options; that took 14 seconds to fully open in a new tab, moving between pages was slowish.
              I then used the download icon in the browser tab to save the file to the HDD, which took about 7 seconds.
              The file opens quickly from the stored folder in the free Foxit pdf reader as below, with the compatibility message top right

              image_2024-07-12_130653218

              In Foxit, each page displays promptly and is quite readable

              image_2024-07-12_130923130
              Bill.

              #741001
              Michael Gilligan
              Participant
                @michaelgilligan61133
                On John Haine Said:

                You might try the Royal Society site directly?

                .

                Much faster, John … that being a different, smaller,  and better quality scan !

                https://royalsocietypublishing.org/doi/epdf/10.1098/rstl.1759.0019

                … which of course leaves us with the original question

                MichaelG.

                #741005
                Michael Gilligan
                Participant
                  @michaelgilligan61133
                  On peak4 Said:

                  Just tried your link in Opera on a Windows 10 PC on a reasonable FTTC connection (about 17 meg I thinK).
                  The file opened quickly in a new tab, but each individual page took a little longer.
                  I selected the pdf option from the choice of download options; that took 14 seconds to fully open in a new tab, moving between pages was slowish.
                  […]

                  Thanks for checking that, Bill

                  I have no problem with the legibility of the text … but the plates in that British Museum copy are dire, and [even after re-starting the iPad] each page takes a few seconds to properly render.

                  I have to wonder if there was some processing oddity in the conversion from microfilm to PDF … but I don’t know where to even start looking.

                  MichaelG.

                  #741009
                  tminusa@yahoo.co.uk
                  Participant
                    @tminusayahoo-co-uk

                    Downloaded almost immediately on my Linux computer, using Mint.

                    #741019
                    Michael Gilligan
                    Participant
                      @michaelgilligan61133
                      On tminusa@yahoo.co.uk Said:

                      Downloaded almost immediately on my Linux computer, using Mint.

                      Presumably I witnessed a case of server overload then

                      … my internet speed is very good here, so my general expectations are probably rather high.

                      .

                      IMG_9830

                      .

                       

                       

                      Q. does the downloaded file render each page quickly for you ?

                      MichaelG.

                      #741022
                      tminusa@yahoo.co.uk
                      Participant
                        @tminusayahoo-co-uk

                        My apologies!   I missed that you were downloading as PDF.

                        I was reading the clicked link. The download was also slow for me as a PDF.

                        Tom.

                        #741023
                        Michael Gilligan
                        Participant
                          @michaelgilligan61133

                          No problem, Tom … thanks for letting me know.

                          MichaelG.

                          #741032
                          SillyOldDuffer
                          Moderator
                            @sillyoldduffer

                            I’ve just downloaded it in under 5 seconds, and paging is almost instantaneous.

                            Just a guess, but I think Michael was the first person in yonks to ask for this PDF from the archive.   In the distant past when I helped develop an Archive, it used a hierarchical storage system to reduce cost.    Everything was stored on the cheapest possible media,  maybe magnetic tape or CD-ROM, or exchangeable magnetic discs in a giant autoloader, making it extremely slow to retrieve.

                            Above the cheapest possible layer, the hierarchical system provided one or more layers of much faster media (more expensive), in which retrieved documents were stored for a while.   Thus responding to the first request would be deadly slow, but thereafter the system would be fast.   Documents stay on faster media until the space they occupy is needed by something more popular, at which point they are purged.  After a purge, next person to ask for the document would have to wait whilst it was read again from slow storage.

                            The slow layer of the system I worked on was based on VHS video tape cartridges.   When a request came in, the auto-loader software driver would identify which cartridge was needed and load it into the tape reader.  Then the reader would wade through the tape looking for the file, perhaps taking tens of minutes to find it.    Very sad, no sooner was this system up and running, than the cost of hard-drives dropped, making the VHS method slow and less cost effective with every passing month…

                            Caching, aka buffering, is universal in computer systems.  Now we might find that the slow layer is a hard-drive, and the fast layer is an SSD, which is further cached in RAM because RAM is now very cheap.  Above the operating system, web hosts also cache, and once a pdf is in memory, all being well, response is almost instantaneous.    All rather unpredictable though because lots of things can upset caching, forcing the system to recall stuff from slow storage.  Increased user activity, applying updates, doing backups and much else.

                            Dave

                             

                            #741040
                            Frances IoM
                            Participant
                              @francesiom58905

                              Dave might well be correct tho this morning I was encountering gateway error 502 from a site that was working well yesterday and is again now – possibly there was some network routing problem in the US.

                              #741068
                              Michael Gilligan
                              Participant
                                @michaelgilligan61133

                                Thanks both ^^^

                                Dave’s analysis seems very reasonable, insofar as it could explain the slow download … but I am still intrigued to know why pages take a noticeable time to render when they have already been downloaded and are now being rendered locally … It’s reminiscent of what is sometimes done in HTML, where a picture is displayed in low resolution for a while, until the full file has loaded [sorry, don’t know the details of how that is done]

                                MichaelG.

                                #741070
                                Michael Gilligan
                                Participant
                                  @michaelgilligan61133

                                  [ demonstration ]

                                  Page when first selected:

                                  IMG_9832

                                  .

                                  … and a few seconds later:

                                  IMG_9833

                                  #741074
                                  Frances IoM
                                  Participant
                                    @francesiom58905

                                    This is a standard approach – jpgs use a cosine transform and the low spatial frequencies can be transmitted first giving a low definition image (which may be sufficient to decide if it is worth waiting for more detail) – the higher spatial frequencies can then be transmitted to give better definition. One proposed scheme demo’d more than 20 years ago for old text was to first transmit a low definition of a page in colour and then to convert the high frequencies to black+white and recode this in same coding scheme used for fax

                                    #741089
                                    Michael Gilligan
                                    Participant
                                      @michaelgilligan61133
                                      On Frances IoM Said:

                                      This is a standard approach – jpgs use a cosine transform and the low spatial frequencies can be transmitted first giving a low definition image […]

                                      Sorry to be thick … but does that mean that the PDF in question is still carrying the baggage of some previous existence as a jpg ?

                                      MichaelG.

                                      #741103
                                      SillyOldDuffer
                                      Moderator
                                        @sillyoldduffer
                                        On Michael Gilligan Said:
                                        On Frances IoM Said:

                                        This is a standard approach – jpgs use a cosine transform and the low spatial frequencies can be transmitted first giving a low definition image […]

                                        Sorry to be thick … but does that mean that the PDF in question is still carrying the baggage of some previous existence as a jpg ?

                                        MichaelG.

                                        Not thick at all because PDF is a container system not a single file format.   I tried to understand PDF once, and was defeated!

                                        Some PDF documents contain nothing but text, others are bitmaps (like jpg), and many intermix both text and image in the same document.

                                        Quite a lot of old books on the web were scanned as bitmaps, resulting in huge pdf files, slow download and render times, and poor search.   Text pdfs produced by Optical Character Recognition, are much smaller, with a better search, but because OCR often causes too many errors, the bitmap capture remains common.

                                        The Smeaton document looks like a bitmap capture to me, but performance should be reasonable because it’s quite small.

                                        Dave

                                      Viewing 18 posts - 1 through 18 (of 18 total)
                                      • Please log in to reply to this topic. Registering is free and easy using the links on the menu at the top of this page.

                                      Advert

                                      Latest Replies

                                      Home Forums The Tea Room Topics

                                      Viewing 25 topics - 1 through 25 (of 25 total)
                                      Viewing 25 topics - 1 through 25 (of 25 total)

                                      View full reply list.

                                      Advert