Next: Remarks
Up: dml_standards_fin
Previous: Server Techniques
  Contents
It is suggested to set up public servers for
- format conversions,
- performing OCR,
- automatic supply of metadata for an article
(using Dublin Core [[8] ], Open Archive [[14] ], or similar encodings),
- uploading digitized material to the DML,
- registry of all ongoing projects, keeping track of
ongoing/completed/planned digitizing projects
and allowing mathematicians, librarians and other interested
users to propose further material for registration,
- scan servers: a scan server should be a place with good
scanning equipment, where people can send their paper material to in
order to get it scanned at high quality.
Reason: Setting up the DML is a task for many people and will last
10-15 years or longer. Any individual or institutional contribution of
digitizations therefore should be welcome. Individuals should be
encouraged and enabled to help.
In order to enable many contributors to provide digitized material in
a sufficiently high quality, it is necessary to provide public tools
to transform the material into the right format, which is sometimes
technically demanding, and to provide text layers by OCR (this should
be optimized for the language the manuscript is written in, therefore
it would be good to have public servers for the various language
areas). Also, it should be easy for contributors to provide the
scanned material with (elementary) metadata such as MSC, keywords and
phrases on Dublin Core and/or Open Archive basis.
In principle, this technology will be an advantage for any scientific
discipline (as well as for more general areas of electronic
literature, so the suggestion of a set of servers like this as a basic
archiving infrastructure might help to convince funding agencies to
give support for DML projects).
Public format conversion servers could also contribute to solve the long
term archiving problem, since they provide a dynamic tool for achieving
this.
Of course, all these servers should be able to handle mass data upload
(script driven), as well as individual files.
The Digitization Projects should:
- Use stable URLs and stable interfaces.
- Offer exportable records for monographs and serials in standard
formats to all libraries for their online catalogues.
These books and journals should be on the ``library shelves'' of
every library in the world!
- Offer exportable records for journal articles in standard form
to the databases of Mathematical Reviews and Zentralblatt für Mathematik.
- Offer records for reference linking to MR and ZBL when the
references have been identified.
All Libraries should:
- Add electronic records for all these freely available journals and
books to their online catalogue.
- Notify users: Post notices in journal sections of the library
to alert users to the fact that certain journals are also online.
Perhaps even mention if they are searchable or have reference linking.
The databases MR and ZBL should:
- Add all journal articles from the digitization projects which
are not already listed; link to all.
- Add the citation information from the projects to their
databases.
Steps for the DML to take:
- Keep an up-to-date listing of the math digitization projects.
- Keep an up-to-date listing of mathematics items and status of digitization.
- Maintain a volunteer network similar to Project Gutenberg [17].
- Collect and disseminate information between the projects.
- Keep lists of digitization vendors (quality, prices, etc.).
- Keep statistics on production:
Per page costs, numbers of references, etc,
- Maintain information on and develop software tools:
- Keep track of current best commercial and free tools: OCR;
Tools to work with TIFF, PDF, DjVu, etc.
- Coordinate the development and distribute software tools:
- Reference extracting software (easier for complete journals),
- Matching software for reference linking
- Software to convert searchable PDF to searchable DjVu,
- Software to put references directly into PDF, DjVu files.
- Establish servers analogous to the Any2DjVu Server [2]
for conversion purposes: OCR; TIFF to PDF, DjVu; link location & insertion
Next: Remarks
Up: dml_standards_fin
Previous: Server Techniques
  Contents
Ulf Rehmann
2003-07-27